Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3...

32
1 www.azulsystems.com | ©2005 Azul Systems, Inc.

Transcript of Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3...

Page 1: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

1

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Page 2: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

©2005 Azul Systems, Inc.

Stack Based Allocation in the Azul JVM™

Stack Based Allocation in the Azul JVM™

Dr. Cliff [email protected]. Cliff [email protected]

Page 3: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

3

www.azulsystems.com

| ©2005 Azul Systems, Inc.

BackgroundBackground

• The Azul JVM™ is based on Sun HotSpot─ a State-of-the-Art Java VM

• Java is a GC'd language• HotSpot uses a fast Generational GC

─ with “test and bump” allocation

• Test and bump allocation streams through memory• Streaming memory is hard on caches• Azul is looking at using Stack Based Allocation

• The Azul JVM™ is based on Sun HotSpot─ a State-of-the-Art Java VM

• Java is a GC'd language• HotSpot uses a fast Generational GC

─ with “test and bump” allocation

• Test and bump allocation streams through memory• Streaming memory is hard on caches• Azul is looking at using Stack Based Allocation

Page 4: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

4

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Cost of using a Generational GCCost of using a Generational GC

• Typical allocation is pointer compare & bump• Deallocation is “free”

─ Requires scanning the live set─ Live set is much smaller than what was allocated

• Allocation streams through young generation─ Similar to streaming array access in Fortran

• Young generation >> L1 cache size─ Implies caches are flushed repeatedly

• Cache miss cost is spread out─ Hard to spot

• Typical allocation is pointer compare & bump• Deallocation is “free”

─ Requires scanning the live set─ Live set is much smaller than what was allocated

• Allocation streams through young generation─ Similar to streaming array access in Fortran

• Young generation >> L1 cache size─ Implies caches are flushed repeatedly

• Cache miss cost is spread out─ Hard to spot

Page 5: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

5

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Cost of using a Generational GCCost of using a Generational GC

• Allocation streams through young generation• Cache miss for each new object

─ Can cover with prefetch

• Read data is dead─ Object must be initialized─ Can prevent read with “cache-line-zero” ops

• Must evict for each new object─ Evicted data is mostly dead─ Occasional live object

• Result: wasted read AND wasted write

• Allocation streams through young generation• Cache miss for each new object

─ Can cover with prefetch

• Read data is dead─ Object must be initialized─ Can prevent read with “cache-line-zero” ops

• Must evict for each new object─ Evicted data is mostly dead─ Occasional live object

• Result: wasted read AND wasted write

Page 6: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

6

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Generational GCGenerational GC

Memory usage, over time

Fresh heap, hotcached memory

Allocation Point

Free heap, cold uncached memory

L1 Cache

Allocation Point

Page 7: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

7

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Generational GCGenerational GC

Fresh heap, hotcached memory

Allocation Point

Memory usage, over time

Free heap, cold uncached memory

Allocation Point

L1 Cache

Page 8: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

8

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Generational GCGenerational GC

Fresh heap, hotcached memory

Memory usage, over time

Free heap, cold uncached memory

Allocation Point

Reading dead memory Filling cache lines

L1 Cache

Page 9: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

9

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Generational GCGenerational GC

Fresh heap, hotcached memory

Allocation Point

Old, cold heap,mostly dead

Memory usage, over time

Free heap, cold uncached memory

Allocation Point

L1 Cache

Page 10: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

10

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Generational GCGenerational GC

Fresh heap, hotcached memory

Allocation Point

Old, cold heap,mostly dead

Memory usage, over time

Free heap, cold uncached memory

Allocation Point

Writing mostlydead cache lines

L1 Cache

Page 11: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

11

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Generational GCGenerational GC

Free heap, cold uncached memory

Memory usage, over time

Fresh heap, hotcached memory

Allocation Point

Old, cold heap,mostly dead

Allocation Point

L1 Cache

Page 12: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

12

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Stack Based AllocationStack Based Allocation

• Objects are allocated on the stack─ Stack space reserved on function entry

• Deallocate is “free” on function return• Has similar direct costs to Generational GC

─ Orthogonal to Generational GC

• Recycles memory instead of streams memory ─ Has a smaller cache footprint─ Fewer misses, stalls

• Requires knowledge about object lifetime─ Error if object “escapes” function return

• Objects are allocated on the stack─ Stack space reserved on function entry

• Deallocate is “free” on function return• Has similar direct costs to Generational GC

─ Orthogonal to Generational GC

• Recycles memory instead of streams memory ─ Has a smaller cache footprint─ Fewer misses, stalls

• Requires knowledge about object lifetime─ Error if object “escapes” function return

Page 13: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

13

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Stack Based AllocationStack Based Allocation

Allocation Point

Call Sam

Free stack

Sam's frame

L1 Cache

SP

Sam { a = new; b = new; Ted(); bad = Uli(); bad.crash;}

Page 14: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

14

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Stack Based AllocationStack Based Allocation

Allocation Point

Allocate object a

Free stack

SP

a

a

L1 Cache

Sam's frame Sam { a = new; b = new; Ted(); bad = Uli(); bad.crash;}

Page 15: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

15

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Stack Based AllocationStack Based Allocation

Allocation Point

Allocate object b

Free stack

SP

a b

a b

L1 Cache

Sam { a = new; b = new; Ted(); bad = Uli(); bad.crash;}

Page 16: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

16

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Stack Based AllocationStack Based Allocation

Call Ted

Ted's frame

Allocation Point

Free stack

SP

a b

Sam's frame

a b

L1 Cache

Sam { a = new; b = new; Ted(); bad = Uli(); bad.crash;}Ted { c = new; return;}

Page 17: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

17

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Stack Based AllocationStack Based Allocation

Allocate object c

Free stack

SP

a b

Allocation Point

c

Sam's frame

a b c

L1 Cache

Ted's frame Sam { a = new; b = new; Ted(); bad = Uli(); bad.crash;}Ted { c = new; return;}

Page 18: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

18

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Stack Based AllocationStack Based Allocation

Exit Ted, free c

Free stack

SP

a b

a b c

L1 Cache

Allocation Point

Sam { a = new; b = new; Ted(); bad = Uli(); bad.crash;}Ted { c = new; return;}

Page 19: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

19

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Stack Based AllocationStack Based Allocation

Call Uli

Uli's frame

Allocation Point

Free stack

SP

a b

Sam's frame

a b

L1 Cache

c

Sam { a = new; b = new; Ted(); bad = Uli(); bad.crash;}Uli { d = new; return d;}

Page 20: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

20

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Stack Based AllocationStack Based Allocation

Allocate object d

Allocation Point

Free stack

SP

a b

Sam's frame

a b

L1 Cache

d Sam { a = new; b = new; Ted(); bad = Uli(); bad.crash;}Uli { d = new; return d;}

Page 21: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

21

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Stack Based AllocationStack Based Allocation

Exit Uli, return d

Free stack

SP

a b

a b d

L1 Cache

Allocation Point

d Sam { a = new; b = new; Ted(); bad = Uli(); bad.crash;}Uli { d = new; return d;}

Page 22: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

22

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Stack Based AllocationStack Based Allocation

• Fast allocate and free• Better cache behavior• Only works for objects with stack lifetime• How do we know object lifetime?• Escape Analysis

─ Precise, conservative─ Expensive at compile time

• Escape Detection─ Imprecise, optimistic─ Expensive at run time (requires write barrier)

• Fast allocate and free• Better cache behavior• Only works for objects with stack lifetime• How do we know object lifetime?• Escape Analysis

─ Precise, conservative─ Expensive at compile time

• Escape Detection─ Imprecise, optimistic─ Expensive at run time (requires write barrier)

Page 23: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

23

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Escape DetectionEscape Detection

• Escape Analysis─ Conservatively correct─ Objects known to never escape─ Allows objects to be en-registered─ Fairly expensive (for a JIT) except in trivial cases

• Escape Detection─ Optimistic; some objects may escape─ Requires a complex write barrier and fix-up

• Write barrier can be done in hardware• Requires minor hardware change

─ Azul is implementing a Java™ VM with custom hardware

• Escape Analysis─ Conservatively correct─ Objects known to never escape─ Allows objects to be en-registered─ Fairly expensive (for a JIT) except in trivial cases

• Escape Detection─ Optimistic; some objects may escape─ Requires a complex write barrier and fix-up

• Write barrier can be done in hardware• Requires minor hardware change

─ Azul is implementing a Java™ VM with custom hardware

Page 24: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

24

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Escape Detection HardwareEscape Detection Hardware

• Allocate objects in stack frames• Steal some address bits for frame depth (FID)• FID bits stored in object pointer• Hardware ignores FID on a load (getfield)• Hardware does range check on a store (putfield)

─ Storing old FID into new FID object is OK─ Storing new FID into old FID object TRAPS

• Trap requires expensive fixup─ Crawl only new frames on this thread's stack ─ Move object, adjust FIDs

• Allocate objects in stack frames• Steal some address bits for frame depth (FID)• FID bits stored in object pointer• Hardware ignores FID on a load (getfield)• Hardware does range check on a store (putfield)

─ Storing old FID into new FID object is OK─ Storing new FID into old FID object TRAPS

• Trap requires expensive fixup─ Crawl only new frames on this thread's stack ─ Move object, adjust FIDs

Page 25: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

25

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Fixup is ExpensiveFixup is Expensive

• Fixup must be rare─ Object factories always escape object!

• Tag allocation sites with relative allocation depth• Allocate object in frames with longer lifetimes

─ Or directly in heap if needed

• Start small• Adjust tag upwards with each failure• Rapidly converges in practice!• No expensive analysis

• Fixup must be rare─ Object factories always escape object!

• Tag allocation sites with relative allocation depth• Allocate object in frames with longer lifetimes

─ Or directly in heap if needed

• Start small• Adjust tag upwards with each failure• Rapidly converges in practice!• No expensive analysis

Page 26: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

26

www.azulsystems.com

| ©2005 Azul Systems, Inc.

How Big are Frames?How Big are Frames?

• Check for frame overflow on allocation─ Same as test-and-bump allocation

• If it fits, fine. If not...• Allocate an overflow area on the side

─ Stamp same FID into objects─ Hardware checks FID, not actual stack address

• Adjust call prolog code ─ Make a larger frame for next time

• Rapidly converges in practice!

• Check for frame overflow on allocation─ Same as test-and-bump allocation

• If it fits, fine. If not...• Allocate an overflow area on the side

─ Stamp same FID into objects─ Hardware checks FID, not actual stack address

• Adjust call prolog code ─ Make a larger frame for next time

• Rapidly converges in practice!

Page 27: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

27

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Never-Exit FramesNever-Exit Frames

• Some top-level frames never exit─ Also loops, and JIT may inline

• But will accumulate dead objects─ Objects only freed on a frame exit!

• Periodically need a thread-local GC─ Toss out dead objects─ Compact overflow areas─ Can adjust frame sizes

• Some top-level frames never exit─ Also loops, and JIT may inline

• But will accumulate dead objects─ Objects only freed on a frame exit!

• Periodically need a thread-local GC─ Toss out dead objects─ Compact overflow areas─ Can adjust frame sizes

Page 28: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

28

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Inlining helps Site TaggingInlining helps Site Tagging

• Allocation site behavior may depend on call path• Same site may want different depth tags depending

on caller• Hot code gets compiled, with inlining• Inlining clones allocation sites• Each inlined clone gets its own depth tag

• Allocation site behavior may depend on call path• Same site may want different depth tags depending

on caller• Hot code gets compiled, with inlining• Inlining clones allocation sites• Each inlined clone gets its own depth tag

Page 29: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

29

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Reverse: Tags Guide InliningReverse: Tags Guide Inlining

• Depth tags tell empirical lifetime─ Of objects allocated here

• Likely can prove actual=empirical with Escape Analysis

• Inline up to tagged depth• Run Escape Analysis on inlined code

─ E.A. is cheap in a single compilation unit─ E.A., if successful, is better than Detection─ Can en-register whole objects

• Avoid E.A. If allocation site escapes

• Depth tags tell empirical lifetime─ Of objects allocated here

• Likely can prove actual=empirical with Escape Analysis

• Inline up to tagged depth• Run Escape Analysis on inlined code

─ E.A. is cheap in a single compilation unit─ E.A., if successful, is better than Detection─ Can en-register whole objects

• Avoid E.A. If allocation site escapes

Page 30: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

30

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Prelimary ResultsPrelimary Results

• Modest sized runs of SpecJVM98, SpecJBB, SJAS• About ½ of all objects stack allocated• With 4meg overflow areas

─ Typical stack GC needed every 6-8sec─ Stack GC takes <1msec

• Some anomalies observed• Lots of stack-allocated 2K and 4K arrays

─ May be faster to heap allocate─ Because of Azul fast zero'ing hardware

─ Frames forced too large too quick

• JIT inlines allocation into loops

• Modest sized runs of SpecJVM98, SpecJBB, SJAS• About ½ of all objects stack allocated• With 4meg overflow areas

─ Typical stack GC needed every 6-8sec─ Stack GC takes <1msec

• Some anomalies observed• Lots of stack-allocated 2K and 4K arrays

─ May be faster to heap allocate─ Because of Azul fast zero'ing hardware

─ Frames forced too large too quick

• JIT inlines allocation into loops

Page 31: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

31

www.azulsystems.com

| ©2005 Azul Systems, Inc.

Software Escape BarrierSoftware Escape Barrier

• Azul has precise per-frame E.D. hardware• Possible to make an software E.D. barrier• Compare raw stack addresses & trap

─ Requires all stacks be on same side of heap

• Store FID just prior to object header in stack• Trap checks FID before declaring an escape

─ load, load, compare, branch

• Trap can fail repeatedly for non-escaping─ Two objects in same frame but wrong order─ No good overflow solution

• Azul has precise per-frame E.D. hardware• Possible to make an software E.D. barrier• Compare raw stack addresses & trap

─ Requires all stacks be on same side of heap

• Store FID just prior to object header in stack• Trap checks FID before declaring an escape

─ load, load, compare, branch

• Trap can fail repeatedly for non-escaping─ Two objects in same frame but wrong order─ No good overflow solution

Page 32: Stack Based Allocation in the Azul JVMcliffc.org/blog/wp-content/uploads/2017/12/2004_SBA.pdf · 3 | ©2005 Azul Systems, Inc. Background •The Azul JVM™ is based on Sun HotSpot

32

www.azulsystems.com

| ©2005 Azul Systems, Inc.

SummarySummary

• First GC-specific hardware in a long time• Looks good on paper• Looks good in prelimary results• Real Thing is a little ways off yet

─ Needs tuning─ Performance-proof in large runs

• First GC-specific hardware in a long time• Looks good on paper• Looks good in prelimary results• Real Thing is a little ways off yet

─ Needs tuning─ Performance-proof in large runs