Bias-Free Neural Predictor

19
Dibakar Gope and Mikko H. Lipasti University of Wisconsin – Madison Championship Branch Prediction 2014 Bias-Free Neural Predictor

description

Bias-Free Neural Predictor. Dibakar Gope and Mikko H. Lipasti University of Wisconsin – Madison Championship Branch Prediction 2014. Executive Summary. Problem: Neural predictors show high accuracy 64KB restrict correlations to ~256 branches - PowerPoint PPT Presentation

Transcript of Bias-Free Neural Predictor

Page 1: Bias-Free Neural Predictor

Dibakar Gope and Mikko H. LipastiUniversity of Wisconsin – Madison

Championship Branch Prediction 2014

Bias-Free Neural Predictor

Page 2: Bias-Free Neural Predictor

2

Executive SummaryProblem:• Neural predictors show high accuracy

• 64KB restrict correlations to ~256 branches

• Longer history still useful (TAGE showed that)

• Bigger h/w increases power & training cost!

Goal: +Large History

Limited H/W

Our Solution: Filter useless context out

Page 3: Bias-Free Neural Predictor

3

Key TermsBiased– Resolve as T/NT virtually every time

Non-Biased– Resolve in both directions

Let’s see an example …

Page 4: Bias-Free Neural Predictor

4

Motivating Example

A

B C

D

E

Non-Biased

Non-Biased

Biased

Biased

Biased

Left-Path Right-Path

B, C & D provide No

additional information

Page 5: Bias-Free Neural Predictor

5

Takeaway• NOT all branches provide useful context

• Biased branches resolve as T/NT every time– Contribute NO useful information– Existing predictors include them!

• Branches w/ No useful context can be omitted

Page 6: Bias-Free Neural Predictor

6

Biased BranchesSP

EC00

SPEC

01SP

EC02

SPEC

03SP

EC04

SPEC

05SP

EC06

SPEC

07SP

EC08

SPEC

09SP

EC10

SPEC

11SP

EC12

SPEC

13SP

EC14

SPEC

15SP

EC16

SPEC

17SP

EC18

SPEC

19 FP1

FP2

FP3

FP4

FP5

INT1

INT2

INT3

INT4

INT5

MM

1M

M2

MM

3M

M4

MM

5

SERV

1SE

RV2

SERV

3SE

RV4

SERV

5

01020304050607080

% o

f Tot

al B

ranc

hes

Page 7: Bias-Free Neural Predictor

7

Bias-Free Neural PredictorConventional Weight Table…..

…..BFN

Weight Table

Recency-Stack-like

GHR

Positional History

Folded Path

History

One-Dim.Weight Table

Filter Biased

Branches

GHR:

BF-GHR:

Page 8: Bias-Free Neural Predictor

8

Idea 1: Filtering Biased Branches

Unfiltered GHR: 1 0 1 0 0 1 0A X Y B Z B C

Bias-Free GHR:A B B C1 0 1 0

Biased: BNon-Biased: NB

NB B B NB B NB NB

Page 9: Bias-Free Neural Predictor

9

Idea 1: Biased Branch Detection• All branches begin

being considered as biased

• Branch Status Table (BST) – Direct-mapped – Tracks status

Page 10: Bias-Free Neural Predictor

10

Idea 2: Filtering Recurring Instances (I)

• Minimize footprint of a branch in the history

• Assists in reaching very deep into the history

Unfiltered GHR: 1 0 1 0 0 1 0A B B C A C B

Bias-Free GHR:A B C1 0 0

Non-Biased:

Page 11: Bias-Free Neural Predictor

11

Idea 2: Filtering Recurring Instances (II)

• Recency stack tracks most recent occurrence

• Replace traditional GHR-like shift register

D Q

PC 𝑥

=?PC𝑛𝑏

D Q

PC 𝑦

=?

D Q

PC 𝑧

=?

D Qh 𝑖𝑛

CLK❑

Page 12: Bias-Free Neural Predictor

12

Re-learning Correlations

CA B B CA X

X Detected Non-biased

Bias-Free GHR:

Unfiltered GHR: A X B C A X B C

Table IndexHash Func.

1 2 3 1 3 4

Page 13: Bias-Free Neural Predictor

13

Idea 3: One-Dimensional Weight Table

• Branches Do NOT depend on relative depths in BF-GHR

• Use absolute depths to index

CA B B CA X

X Detected Non-biased

Bias-Free GHR:

Unfiltered GHR: A X B C A X B C

Table IndexHash Func.

Page 14: Bias-Free Neural Predictor

14

Idea 4: Positional Historyif (Some Condition) / / Branch A array [ 10 ] = 1;

for ( i = 0 ; i < 100 ; i ++) / / Branch L{ if ( array [ i ] == 1 ) { ..... } / / Branch X}

• Recency-stack-like GHR capture same history across all instances

Aliasing • Positional history solves that!

Only One instance of X

correlates w/ A

Page 15: Bias-Free Neural Predictor

15

Idea 5: Folded Path History• A influences B differently

– If path changes from M-N to X-Y

• Folded history solves that– Reduce aliasing on recent

histories– Prevent collecting noise

from distant histories

NA M

YA X

Path A-M-N

Path A-X-Y

B

Page 16: Bias-Free Neural Predictor

16

Conventional Perceptron Component• Some branches have

– Strong bias towards one direction– No correlations at remote histories

• Problem: BF-GHR can not outweigh bias weight during training

• Solution: No filtering for few recent history bits

Page 17: Bias-Free Neural Predictor

17

BFN Configuration (32KB)

CA BGHR:

Table IndexHash Func.

2-dim weight table

1-dim weight table

ZX Y Loop Pred.

+

Is Loop?

Prediction

Bias-FreeUnfiltered

Unfiltered: recent 11 bitsBias-Free: 36 bits

Page 18: Bias-Free Neural Predictor

18

Contributions of Optimizations

3 Optimizations : 1-dim weight table + phist + fhist

BFN (3 Optimizations) MPKI: 3.01BFN (ghist bias-free + 3 Optimizations) MPKI: 2.88BFN (ghist bias-free + RS+ 3 Optimizations) MPKI: 2.73

SPEC Avg. FP Avg. INT Avg. MM Avg. SERV Avg. Avg.0

0.5

1

1.5

2

2.5

3

3.5

4

4.5BFN (3 Optis.)

BFN(ghist bias-free + 3 Optis.)

BFN (ghist bias-free + RS + 3 Optis.)

Mis

pred

ictio

ns p

er 1

000

Inst

s.

Page 19: Bias-Free Neural Predictor

19

Conclusion• Correlate only w/ non-biased branches

• Recency-Stack-like policy for GHR

• 3 Optimizations– one-dim weight table– positional history– folded path history

• 47 bits to reach very deep into the history