Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack...
Transcript of Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack...
![Page 1: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/1.jpg)
Virtual Machine Showdown:Stack versus Registers
David GreggSoftware Systems Lab
Yunhe Shi, Kevin Casey (TCD), Anton Ertl (TU Wien)
![Page 2: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/2.jpg)
Virtual Machines (VM)• High-level language VMs
– Popular for implementing programming languages• Java, C#, Pascal, Perl, Lua
• Program is compiled to virtual machine code– Similar to real machine code– But architecture neutral
• VM implemented on all target architectures– Using interpreter and/or JIT compiler– Same VM code then runs on all machines
![Page 3: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/3.jpg)
Virtual Machines (VM)
• Interpreter is program that emulates VM
• Just-in-time (JIT) compiler translates bytecode to real machine code
Java source file
javac compiler
Java bytecode file
Java VM
interpreter JIT compiler
![Page 4: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/4.jpg)
Why use interpreters?
• Huge advances in JIT compilers in last 15 years– Faster compilation– Better optimization
• But interpreters are still hugely popular for implementing VMs– Why do engineers still build
interpreters?
![Page 5: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/5.jpg)
Why build interpreters?• Speed
– Hybrid JIT/interpreter implementations– Real-time systems
• Memory– Size of VM and generated code– Hybrids, embedded systems, code compression
• Software engineering– Portability– Simplicity: cost, correctness– Safety: sandboxing of executing code– Tools: debugger, profiler– Dynamic language features
![Page 6: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/6.jpg)
Stack Architecture
• Almost all real computers use a register architecture– Values loaded to registers– Operated on in registers
• But most popular VMs use stack architecture– Java VM, .NET VM, Pascal P-
code, Perl 5 1325
iadd
sp
iadd r0, r1, r2
r21325
r3r1r0 r4
![Page 7: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/7.jpg)
Stack Architecture
• Almost all real computers use a register architecture– Values loaded to registers– Operated on in registers
• But most popular VMs use stack architecture– Java VM, .NET VM, Pascal P-
code, Perl 5 38
iadd
sp
iadd r0, r1, r2
r21325
r3r1r0 r438
![Page 8: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/8.jpg)
Why stack VMs?
• Code density– No need to specify register numbers
• Easy to generate stack code– No register allocation
• No assumptions about number of registers– ????
• Speed– May be easier to JIT compile– May be faster to interpret
• Or maybe not…
![Page 9: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/9.jpg)
Which VM interpreter is faster?
• Stack VM interpreters– Operands are located on stack– No need to specify location of operands– No need to load operand locations
• Register VM interpreters– Fewer VM instructions needed
• Less shuffling of data onto/off stack– Each VM instruction is more expensive
![Page 10: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/10.jpg)
Which VM interpreter is faster?
• Question debated repeatedly over the years– Many arguments, small examples– No hard numbers
• Some are confident that answer is obvious– But which answer?
![Page 11: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/11.jpg)
VM Interpreters • Emulate a virtual instruction set• Track state of virtual machine
– Virtual instruction pointer (IP)– Virtual stack
• Array in memory• With virtual stack pointer (SP)
– Virtual registers• Array in memory• No easy way to map virtual registers to real registers in
an interpreter
![Page 12: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/12.jpg)
VM Interpreterswhile ( 1 ) { ip++; opcode = *ip; switch ( opcode ) { case IADD: *(sp-1) = *sp + *(sp-1); sp--; break; case ISUB: *(sp-1) = *sp – *(sp-1); sp--; break; case ILOAD_0: *(sp+1) = locals[0]; sp++; break; case ISTORE_0: locals[0] = *sp; sp--; break; ……… }}
![Page 13: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/13.jpg)
VM Interpreters• Dispatch
– Fetch opcode & jump to implementation– Usually most expensive part of execution– Unpredictable indirect branch– Similar cost for both VM types– But register VM needs fewer dispatches
• Fetch operands– Locations are explicit in stack machine
• Perform the operation– Often cheapest part of execution
![Page 14: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/14.jpg)
Stack versus registers
• Our register VM– Simple translation from JVM bytecode– One byte register numbers
Source code Stack code Register codea = b + c; iload b; iadd a, b, c
iload c;iadd;istore a;
![Page 15: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/15.jpg)
Operand Access• Stack machine
– Virtual stack in array– Operands on top of stack– Stack pointer updates
• Register machine– Virtual registers in array– Must fetch operand locations (1-3 extra bytes)
• More loads per VM instruction
![Page 16: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/16.jpg)
From Stack to Register• Translated JVM code to register VM• Local variables mapped directly
– Local 0 → Register 0• Stack locations
– Mapped to virtual registers– Height of stack is always known statically– Assign numbers to stack locations
![Page 17: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/17.jpg)
From Stack to Register
Stack Register Code CommentCodeiload 4 imove r10, r4 ; load local variable 4bipush 57 biload r11, 57 ; push immediate 57iadd iadd r10, r10, r11 ; integer addistore 6 imove r6, r10 ; store TOS to local 6iload 6 imove r10, r6 ; load local variable 6ifeq 7 ifeq r10, 7 ; branch by 7 if TOS==0
![Page 18: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/18.jpg)
From Stack to Register• Clean up register code with classical
optimizations– Copy propagation to remove unnecessary
move operations– Simple redundancy elimination
• Re-use constants already in registers• Stack VM consumes its operands so must load
constants every time it uses them
![Page 19: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/19.jpg)
Experimental Setup• Implemented in Cacao VM• Method is JIT compiled to register code on
first invocation– Results include only executed methods
• Standard benchmarks– SPECjvm98, Java Grande
• Real implementation wouldn’t translate– Better generate register code from source– But translation allows fairer comparison
• Except for translation time
![Page 20: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/20.jpg)
Executed VM Instructions
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Compress
Jess
Db
Javac
Mpegaudio
Mtrt
Jack
MolDyn
RayTracer
Euler
MonteCarlo
Search
Average
Pop Move Eliminated Constant Eliminated Others Eliminated Move Remaining Constant Remaining Others Remaining
![Page 21: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/21.jpg)
Increase in bytecode loads
![Page 22: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/22.jpg)
Ratio of additional loads to eliminated instructions
![Page 23: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/23.jpg)
Real machine memory opsSource Code Stack Codea = b + c; /* iload c */
*(++sp) = locals[c];
Register Code /* iload b *//* iadd a, b, c */ *(++sp) = locals[b];reg[a] = reg[b] + reg[c];
/* iadd */*(sp-1) = *(sp-1) + *sp;sp--;
/* istore a */locals[a] = *(sp--);
![Page 24: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/24.jpg)
Reduction in “real machine” loads/stores compared with dispatches eliminated
![Page 25: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/25.jpg)
Real Running Times• Hardware platforms
• AMD 64• Intel P4• Intel Core 2 Duo• Digital Alpha• IBM PowerPC
![Page 26: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/26.jpg)
Interpreter Dispatch• Switch dispatch
• Large case statement in a loop• Token Threaded dispatch
• Interpreter is set of routines that jump from one to another
• Requires table of machine code addresses• Direct threaded dispatch
• Replace bytecode with sequence of machine code addresses
• Inline threaded dispatch• Simple macro-expanding JIT compiler
![Page 27: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/27.jpg)
Speedup of Register VM – AMD64
0.50
1.00
1.50
2.00
Compr
ess
Jess Db
Java
c
Mpe
gaud
ioM
trtJa
ck
MolD
yn
RayTr
acer
Euler
Mon
teCar
lo
Searc
h
Avera
ge
Inline Threaded Direct Threaded Token Threaded Switch
![Page 28: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/28.jpg)
AMD64 Event Counters - Compress
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Data cacheaccesses(*100B)
Data cachemisses (*200M)
Instruction cachefetches (*200B)
Instruction cachemisses (*2M)
Retired takenbranches (*25B)
Retired takenbranches
mispredicted(*25B)
Retiredinstructions
(*160B)
Register Inline Threaded Stack Inline Threaded Register Direct Threaded
Stack Direct Threaded Register Switch Stack Switch
![Page 29: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/29.jpg)
Eliminating more redundant expressions
• Stack operations consume their operands– So very difficult to re-use existing values– Stack machine must load constants, loop
invariants repeatedly– Register machine can store constants, simple
loop invariants in registers• What about more complex invariants?
– Repeated loads from the heap– Requires very sophisticated pointer analysis
• But what if we could do it?
![Page 30: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/30.jpg)
Eliminating more redundant expressions – speedup on AMD 64
0.50
1.00
1.50
2.00
2.50
3.00
Compr
ess
Jess Db
Java
c
Mpe
gaud
ioM
trtJa
ck
MolD
yn
RayTr
acer
Euler
Mon
teCar
lo
Searc
h
Avera
ge
Inline Threaded Direct Threaded Token Threaded Switch
These results are not safe!
![Page 31: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/31.jpg)
Summary• Detailed quantitative results
– 46% reduction in executed VM instructions– 26% increase in bytecode size– 25% increase in bytecode loads
• Speedup depends on dispatch scheme– Speedup 1.48 with switch dispatch on AMD64– Even with the most efficient dispatch, 1.15
speedup can still be achieved
![Page 32: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/32.jpg)
Real world VMs• Register vs. stack debate ongoing for
many years– Perl 6 VM started in 2000
• Version 1.0 March 2009• Partially motivated our work
• Some newish register VMs– Lua 5 VM– SquirrelFish– Google Dalvik VM in Android– Unladen Swallow (??)
![Page 33: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/33.jpg)
Conclusions• Register VM offers faster execution
of the VM interpreter at the cost of greater bytecode size
• Detailed results can be found in ACMTACO, January 2008
![Page 34: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/34.jpg)
Thank you.
![Page 35: Virtual Machine Showdown: Stack versus Registers David Gregg · Virtual Machine Showdown: Stack versus Registers David Gregg Software Systems Lab Yunhe Shi, Kevin Casey (TCD), Anton](https://reader035.fdocuments.net/reader035/viewer/2022071214/6043d05477810c57ec362034/html5/thumbnails/35.jpg)
Dispatch Comparison – AMD 64
0.0
1.0
2.0
3.0
4.0
5.0
6.0
Compr
ess
Jess Db
Java
c
Mpe
gaud
ioM
trtJa
ck
MolD
yn
RayTr
acer
Euler
Mon
teCar
lo
Searc
h
Avera
ge
Register inline-threaded Register direct-threaded Register token-threaded
Register switch Stack inline-threaded Stack direct-threaded
Stack token-threaded Stack switch SUN JDK 1.6.0 (Intrp. Only)