Bias-Free Neural Predictor

Dibakar Gope and Mikko H. LipastiUniversity of Wisconsin – Madison

Championship Branch Prediction 2014

Bias-Free Neural Predictor

2

Executive SummaryProblem:• Neural predictors show high accuracy

• 64KB restrict correlations to ~256 branches

• Longer history still useful (TAGE showed that)

• Bigger h/w increases power & training cost!

Goal: +Large History

Limited H/W

Our Solution: Filter useless context out

3

Key TermsBiased– Resolve as T/NT virtually every time

Non-Biased– Resolve in both directions

Let’s see an example …

4

Motivating Example

A

B C

D

E

Non-Biased

Non-Biased

Biased

Biased

Biased

Left-Path Right-Path

B, C & D provide No

additional information

5

Takeaway• NOT all branches provide useful context

• Biased branches resolve as T/NT every time– Contribute NO useful information– Existing predictors include them!

• Branches w/ No useful context can be omitted

6

Biased BranchesSP

EC00

SPEC

01SP

EC02

SPEC

03SP

EC04

SPEC

05SP

EC06

SPEC

07SP

EC08

SPEC

09SP

EC10

SPEC

11SP

EC12

SPEC

13SP

EC14

SPEC

15SP

EC16

SPEC

17SP

EC18

SPEC

19 FP1

FP2

FP3

FP4

FP5

INT1

INT2

INT3

INT4

INT5

MM

1M

M2

MM

3M

M4

MM

5

SERV

1SE

RV2

SERV

3SE

RV4

SERV

5

01020304050607080

% o

f Tot

al B

ranc

hes

7

Bias-Free Neural PredictorConventional Weight Table…..

…..BFN

Weight Table

Recency-Stack-like

GHR

Positional History

Folded Path

History

One-Dim.Weight Table

Filter Biased

Branches

GHR:

BF-GHR:

8

Idea 1: Filtering Biased Branches

Unfiltered GHR: 1 0 1 0 0 1 0A X Y B Z B C

Bias-Free GHR:A B B C1 0 1 0

Biased: BNon-Biased: NB

NB B B NB B NB NB

9

Idea 1: Biased Branch Detection• All branches begin

being considered as biased

• Branch Status Table (BST) – Direct-mapped – Tracks status

10

Idea 2: Filtering Recurring Instances (I)

• Minimize footprint of a branch in the history

• Assists in reaching very deep into the history

Unfiltered GHR: 1 0 1 0 0 1 0A B B C A C B

Bias-Free GHR:A B C1 0 0

Non-Biased:

11

Idea 2: Filtering Recurring Instances (II)

• Recency stack tracks most recent occurrence

• Replace traditional GHR-like shift register

D Q

PC 𝑥

=?PC𝑛𝑏

D Q

PC 𝑦

=?

D Q

PC 𝑧

=?

D Qh 𝑖𝑛

CLK❑

12

Re-learning Correlations

CA B B CA X

X Detected Non-biased

Bias-Free GHR:

Unfiltered GHR: A X B C A X B C

Table IndexHash Func.

1 2 3 1 3 4

13

Idea 3: One-Dimensional Weight Table

• Branches Do NOT depend on relative depths in BF-GHR

• Use absolute depths to index

CA B B CA X

X Detected Non-biased

Bias-Free GHR:

Unfiltered GHR: A X B C A X B C


14

Idea 4: Positional Historyif (Some Condition) / / Branch A array [ 10 ] = 1;

for ( i = 0 ; i < 100 ; i ++) / / Branch L{ if ( array [ i ] == 1 ) { ..... } / / Branch X}

• Recency-stack-like GHR capture same history across all instances

Aliasing • Positional history solves that!

Only One instance of X

correlates w/ A

15

Idea 5: Folded Path History• A influences B differently

– If path changes from M-N to X-Y

• Folded history solves that– Reduce aliasing on recent

histories– Prevent collecting noise

from distant histories

NA M

YA X

Path A-M-N

Path A-X-Y

B

16

Conventional Perceptron Component• Some branches have

– Strong bias towards one direction– No correlations at remote histories

• Problem: BF-GHR can not outweigh bias weight during training

• Solution: No filtering for few recent history bits

17

BFN Configuration (32KB)

CA BGHR:


2-dim weight table

1-dim weight table

ZX Y Loop Pred.

+

Is Loop?

Prediction

Bias-FreeUnfiltered

Unfiltered: recent 11 bitsBias-Free: 36 bits

18

Contributions of Optimizations

3 Optimizations : 1-dim weight table + phist + fhist

BFN (3 Optimizations) MPKI: 3.01BFN (ghist bias-free + 3 Optimizations) MPKI: 2.88BFN (ghist bias-free + RS+ 3 Optimizations) MPKI: 2.73

SPEC Avg. FP Avg. INT Avg. MM Avg. SERV Avg. Avg.0

0.5

1

1.5

2

2.5

3

3.5

4

4.5BFN (3 Optis.)

BFN(ghist bias-free + 3 Optis.)

BFN (ghist bias-free + RS + 3 Optis.)

Mis

pred

ictio

ns p

er 1

000

Inst

s.

19

Conclusion• Correlate only w/ non-biased branches

• Recency-Stack-like policy for GHR

• 3 Optimizations– one-dim weight table– positional history– folded path history

• 47 bits to reach very deep into the history

Bias-Free Neural Predictor

Documents

Transcript of Bias-Free Neural Predictor