Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1...

25
1 Research supported by IBM CAS, NSERC, CITO Context Threading: A flexible and efficient dispatch technique for virtual machine interpreters Marc Berndl Benjamin Vitale Mathew Zaleski Angela Demke Brown

Transcript of Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1...

Page 1: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

1Research supported by IBM CAS, NSERC, CITO

Context Threading: A flexible and efficient dispatch technique for

virtual machine interpreters

Marc BerndlBenjamin VitaleMathew Zaleski

Angela Demke Brown

Page 2: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 2

Interpreter performance

• Why not just JIT?• High performance JITs still interpret• People use interpreted languages that don’t

yet have JITs• They still want performance!

• 30-40% of execution time is due to branch misprediction

• Our technique eliminates 95% of branch mispredictions

Page 3: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 3

Overview

Motivation•Background: The Context Problem•Existing Solutions•Our Approach•Inlining•Results

Page 4: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 4

A Tale of Two Machines

LoadedProgram

VirtualProgram load

Return Address

Wayness(Conditional)

Execution Cycle

BytecodeBodies

Pipeline

Target Address(Indirect)

Pre

dict

ors

Execution Cycle

Virtual Machine Interpreter

Real MachineCPU

Page 5: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 5

Interpreter

LoadedProgram

Bytecodebodies

Internal Representation

fetch

dispatchLoadParms

execute

Execution Cycle

Page 6: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 6

Running Java Example

void foo(){int i=1;do{i+=i;} while(i<64);

}

0: iconst_01: istore_12: iload_13: iload_14: iadd5: istore_16: iload_17: bipush 649: if_icmplt 212: return

Java Source Java Bytecode

Javac compiler

Page 7: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 7

Switched Interpreter…iload_1iload_1iaddistore_1iload_1bipush 64if_icmplt 2…

VirtualProgram

while(1){switch(*vPC++){

}};

Switched BodyImplementation

case iload_1:..

break;

case iadd:..

break;

-7

<if_icmplt>64<bipush><iload_1><istore_1><iadd><iload_1><iload_1>

InternalRepresentation

vPC

Simple, portable and extremely slow

Page 8: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 8

Direct Threaded Interpreter

-7

&&if_icmplt64&&bipush&&iload_1&&istore_1&&iadd&&iload_1&&iload_1

…iload_1iload_1iaddistore_1iload_1bipush 64if_icmplt 2…

DTT - DirectThreading Table

VirtualProgram

vPC

iload_1:..

goto *vPC++;

iadd:..

goto *vPC++;

Target of computed goto is data-driven

Page 9: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 9

Context Problem

-7

&&if_icmplt

64

&&bipush

&&iload_1

&&istore_1

&&iadd

&&iload_1

&&iload_1

DTT - DirectThreading Table

iload_1:..

goto *vPC++;

<empty>

<empty>

<empty>

Indirect Branch Predictors

vPCiload_1

iadd

bipush

-7

&&if_icmplt

64

&&bipush

&&iload_1

&&istore_1

&&iadd

&&iload_1

&&iload_1

&&iload_1&&iadd&&bipush

-7

&&if_icmplt

64

&&bipush

&&iload_1

&&istore_1

&&iadd

&&iload_1

&&iload_1

Page 10: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 10

Existing Solutions

BodyBodyBodyBodyBody

GOTO *PC

????

Piumarta & Ricardi :Bodies Replicated

Super InstructionReplicate

iload_1GOTO *PC

1

iload_1GOTO *PC

2

1

1

2

2

Ertl & Gregg:Bodies and Dispatch

Replicated

Limited to relocatable virtual instructions

Page 11: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 11

Overview

MotivationBackground: The Context ProblemExisting Solutions

•Our Approach•Inlining•Results

Page 12: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 12

Key Observation

• Virtual and native control flow have same branch types• Linear (not really a branch)• Conditional• Calls and Returns• Indirect

• Hardware has predictors for each typeSolution: Leverage hardware predictors

Page 13: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 13

Essence of our Solution

iload_1:..

ret;

iadd:..

ret;..call iload_1call istore_1call iaddcall iload_1call iload_1

CTT - ContextThreading Table (generated code)

Bytecode bodies (ret terminated)

Return Branch Predictor Stack

…iload_1iload_1iaddistore_1iload_1bipush 64if_icmplt 2…

Package bodies as subroutines and call them

Page 14: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 14

Context Threading

iload_1:…

ret;

iadd:…

ret;

call bipushcall if_icmplt

call iload_1call istore_1call iaddcall iload_1call iload_1

CTT load timegenerated code

Bytecode bodies (ret terminated)

if_cmplt:…

goto *vPC++;

64

-7

…iload_1iload_1iaddistore_1iload_1bipush 64if_icmplt 2…

DTT containsaddresses in CTT

vPC

Generate calls into the CTT at load time

Page 15: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 15

The Context Threading Table

• A sequence of calls• A sequence of generated instructions• An internal representation of the

program’s control flow• Virtual branches are also control flow

Can virtual branches go into the CTT?

Page 16: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 16

Virtual Branches

target:

……

…call …call iload_1call if_icmplt…

Context Threading Table

5

DTT

vPCif(icmplt)pc=target;

goto *vPC++;

VirtualBranch

Context problem is worse for virtual branches

Page 17: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 17

Specialized Branch Inlining

Conditional Branch Predictor Entry

……target:

…call …call iload_1

if(icmplt)goto target:

Branch Inlined Into the CTT

5

DTT

vPC

target:…

Inlining conditional branches provides context

Page 18: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 18

Tiny Inlining

• Context Threading is a dispatch technique • But, we inline branches

• Some non-branching bodies are very small• Why not inline those?

Inline all tiny linear bodies into the CTT

Page 19: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 19

What can go in the CTT?

• Calls to bodies• Inlined bodies• Mixed-Mode virtual machine?• Partially Inlined bodies• Calls to subroutines that aren’t

bytecodes• Generated code

Performance?

Page 20: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 20

Overview

MotivationBackground: The Context ProblemExisting SolutionsOur ApproachInlining

•Results

Page 21: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 21

Experimental Setup

• Two Virtual Machines on two hardware architectures.• VM: Java/SableVM, OCaml interpreter

• Arch: P4, PPC

• Branch Misprediction

• Execution Time

Is our technique effective and general?

Page 22: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 22

Mispredicted Taken Branches

0.00

0.25

0.50

0.75

1.00

compre

ss dbjackjava

cjess

mpeg

mtrt raysc

imarkso

otSubroutine Branch Inlining Tiny Inlining

Nor

mal

ized

to

Dir

ect

Thr

eadi

ng

95% mispredictions eliminated on average

Page 23: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 23

Execution time

0.00

0.25

0.50

0.75

1.00

compre

ss db jack

javac jes

smpe

gmtrt ray

scim

ark soot

Subroutine Branch Inlining Tiny Inlining

Nor

mal

ized

to

Dir

ect

Thr

eadi

ng

27% average reduction in execution time

Page 24: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 24

Execution Time (geomean)

0.00

0.25

0.50

0.75

1.00

java/p4

java/ppc

ocaml/p

4

ocaml/p

pc

Subroutine Branch Inlining Tiny Inlining

Nor

mal

ized

to

Dir

ect

Thr

eadi

ng

Our technique is effective and general

Page 25: Context Threading: A flexible and efficient dispatch …Research supported by IBM CAS, NSERC, CITO 1 Context Threading: A flexible and efficient dispatch technique for virtual machine

Context Threading 25

Conclusions

•Context Problem: branch mispredictions due to mismatch between native and virtual control flow

•Solution: Generate control flow code into the Context Threading Table

•Results•Eliminate 95% of branch mispredictions•Reduce execution time by 30-40%