Garbage Collection Mythbusters

80

description

Garbage Collection Mythbusters. Simon Ritter Java Technology Evangelist. The Goal. Cover the strengths and weaknesses of garbage collection What GC does well And what not so well. Tracing GC: A Refresher Course. Tracing-based garbage collectors: Discover the live objects - PowerPoint PPT Presentation

Transcript of Garbage Collection Mythbusters

Page 1: Garbage Collection Mythbusters
Page 2: Garbage Collection Mythbusters

<Insert Picture Here>

Garbage Collection MythbustersSimon RitterJava Technology Evangelist

Page 3: Garbage Collection Mythbusters

33

The Goal

Cover the strengths and weaknesses of garbage collection

What GC does wellAnd what not so well

Page 4: Garbage Collection Mythbusters

44

Tracing GC: A Refresher Course

• Tracing-based garbage collectors:• Discover the live objects

• All objects transitively reachable from a set of “roots”(“roots” - known live references that exist outside the

heap, e.g., thread stacks, virtual machine data)• Deduce that the rest are dead

• Reclaim them

• An “indirect” GC technique• Examples

• Mark & Sweep• Mark & Compact• Copying

Page 5: Garbage Collection Mythbusters

55

Tracing GC Example

B

Heap

C

D

G

H

IE

K

MJ

A

F

Runtime

Stack

L

Page 6: Garbage Collection Mythbusters

66

Tracing GC Example

The Runtime Stack is considered live by default. We starttracing transitively from it and mark objects we reach.

B

Heap

C

D

G

H

IE

K

MJ

A

F

Runtime

Stack

L

Page 7: Garbage Collection Mythbusters

77

Tracing GC Example

B

Heap

C

D

G

H

IE

K

MJ

A

F

RuntimeStack

L

Page 8: Garbage Collection Mythbusters

88

Tracing GC Example

B

Heap

C

D

G

H

IE

K

MJ

A

F

RuntimeStack

L

Page 9: Garbage Collection Mythbusters

99

Tracing GC Example

We identified all reachable objects. We can deduce thatthe rest are unreachable and, therefore, dead.

B

Heap

C

D

G

H

IE

K

MJ

A

F

Runtime

Stack

L

Page 10: Garbage Collection Mythbusters

1010

Alternative 1: In-place Deallocation

Keep track of the free space explicit (using free lists,a buddy system, a bitmap, etc.).

Heap

C

H

IE

K

L

A

RuntimeStack

B

Page 11: Garbage Collection Mythbusters

1111

Alternative 2: Sliding Compaction

Slide all live objects to one end of the Heap. All free spaceis located at the other end of the heap.

Heap

C H

IE

K

L

A

RuntimeStack

B

Page 12: Garbage Collection Mythbusters

1212

Copying GC Example

B

Heap (To-Space)

C

D

G

H

IE

KARuntime

Stack

L

Heap (From-Space)

Page 13: Garbage Collection Mythbusters

1313

Copying GC Example

B

Heap (To-Space)

C

D

G

H

IE

K

A

Runtime

Stack

L

A

Heap (From-Space)

C

Page 14: Garbage Collection Mythbusters

1414

Copying GC Example

LB

Heap (To-Space)

C

D

G

H

IE

KA

Runtime

Stack

Heap (From-Space)

A

H

C K

Page 15: Garbage Collection Mythbusters

1515

Copying GC Example

L B

Heap (To-Space)

C

D

G

H KA

Runtime

Stack

LIE

Heap (From-Space)

A

H

C K

B

Page 16: Garbage Collection Mythbusters

1616

Copying GC Example

EL B

Heap (To-Space)

C

D

G

H

I

KA

Runtime

Stack

IE

Heap (From-Space)

L

A

H

C K

B

Page 17: Garbage Collection Mythbusters

1717

Copying GC Example

EL B

C H

I

KA

Heap (To-Space)

Runtime

Stack

Heap (From-Space)

Page 18: Garbage Collection Mythbusters

1818

Myth 1:

malloc/free always perform better than GC.

Page 19: Garbage Collection Mythbusters

1919

Object Relocation

• GC enables object relocation, which in turn enables• Compaction: eliminates fragmentation• Generational GC: decreases GC overhead• Linear Allocation: best allocation performance

• Fast path: ~10 native instructions, inlined, no sync

free space

end

new top

new object

top

used space

Page 20: Garbage Collection Mythbusters

2020

Generational GC is Fast!

• Compare costs (first-order approximation)• malloc/free:

• all_objects * costmalloc + freed_objects * costfree

• Generational GC with copying young generation:• all_objects * costlinear_alloc + surviving_objects * costcopy

• Consider:• costlinear_alloc much less than costmalloc

• surviving_objects often 5% or less of all_objects

Page 21: Garbage Collection Mythbusters

2121

GC vs. malloc Study

• Recent publication shows• When space is tight

• malloc/free outperform GC• When space is ample

• GC can match (or better) malloc/free

• GC just as fast• if given “breathing room”

• Matthew Hertz and Emery Berger• Quantifying the Performance of Garbage Collection vs.

Explicit Memory Management, In Proceedings of OOPSLA 2005, October 2005

Page 22: Garbage Collection Mythbusters

2222

Object Relocation: Other Benefits

• Compaction: can improve page locality• Fewer TLB misses• Cluster objects to improve locality

• Important on NUMA architectures

• Relocation ordering: can improve cache locality• Fewer cache misses

• The important points:• Allocation and reclamation are fast• Relocation can boost application performance

Page 23: Garbage Collection Mythbusters

2323

Myth 1:

malloc/free always perform better than GC.

Page 24: Garbage Collection Mythbusters

2424

Myth 1:

malloc/free always perform better than GC.

Busted!

Page 25: Garbage Collection Mythbusters

2525

Myth 2:

Reference counting would solve all my GC problems.

Page 26: Garbage Collection Mythbusters

2626

Reference Counting

• Each object holds a count• How many references point to it• Increment it when a new reference points to the object• Decrement it when a reference to the object is dropped

• When reference count reaches 0• Object is unreachable• It can be reclaimed

• A “direct” GC technique

Page 27: Garbage Collection Mythbusters

2727

Reference Counting Example

Heap

C

H

IE

K

L

A

Runtime

Stack

MJF

B 2

1

1 1

1

1

1 11

2

1

Page 28: Garbage Collection Mythbusters

2828

Reference Counting Example

MJF

Heap

C

H

IE

K

L

A

Runtime

Stack

B

Delete reference,Decrease H's RC

2

1

1 1

1

1

1 11

2

1

Page 29: Garbage Collection Mythbusters

2929

Reference Counting Example

MJF

Heap

C

H

IE

K

L

A

Runtime

Stack

B 2

1

1 1

0

1

1 11

2

1

Page 30: Garbage Collection Mythbusters

3030

Reference Counting Example

MJF

Heap

C

H

IE

K

L

A

Runtime

Stack

B

Decrease K's & L's RCs,Reclaim H

2

1

1 1

0

1

1 11

2

1

Page 31: Garbage Collection Mythbusters

3131

Reference Counting Example

MJF

Heap

C

IE

K

L

A

Runtime

Stack

B 1

1

1 1

0

1 11

2

1

Page 32: Garbage Collection Mythbusters

3232

Reference Counting Example

MJF

Heap

C

IE

K

L

A

Runtime

Stack

B 1

1

1 1

0

1 11

2

1

Reclaim K

Page 33: Garbage Collection Mythbusters

3333

Reference Counting Example

MJF

Heap

C

IE L

A

Runtime

Stack

B 1

1

1 1

1 11

2

1

Page 34: Garbage Collection Mythbusters

3434

Traditional Reference Counting

• Extra space overhead• One reference count per object

• Extra time overhead• Up to two reference count updates per reference field update• Very expensive in a multi-threaded environment

• Non-moving• Fragmentation

• Not always incremental or prompt• Garbage cycles

• Counts never reach 0• Cannot be reclaimed

Page 35: Garbage Collection Mythbusters

3535

Reference Counting Example

Objects F and J form a garbage cycle and also retain M too.

MJF

Heap

C

IE L

A

RuntimeStack

B 1

1

1 1

1 11

2

1

Page 36: Garbage Collection Mythbusters

3636

Advanced Reference Counting• Two-bit reference counts

• Most objects pointed to by one or two references• When max count (3) is reached

• Object becomes “sticky”

• Buffer reference updates• Apply them in bulk

• Combine with copying GC• Use a backup GC algorithm

• Handle cyclic garbage• Deal with “sticky” objects• Typically, the cyclic GC is a tracing GC

• Complex, and still non-moving

Page 37: Garbage Collection Mythbusters

3737

Myth 2:

Reference counting would solve all my GC problems.

Page 38: Garbage Collection Mythbusters

3838

Myth 2:

Reference counting would solve all my GC problems.

Busted!

Page 39: Garbage Collection Mythbusters

3939

Myth 3:

GC with explicit deallocation would drastically improve performance.

Page 40: Garbage Collection Mythbusters

4040

GC with Explicit Deallocation?

• Philosophically• Would compromise safety

• Practically• Not all GC algorithms can support it• Mark-Compact & Copying GCs

• Do not maintain free lists• Reclaim space by moving live objects

• Overwrite reclaimed objects• No way to reuse space from a single object

• Unless the object is at the end of the heap

Page 41: Garbage Collection Mythbusters

4141

GC with Explicit Deallocation? (ii)

• Explicit deallocation is incompatible with this model• Would compromise the very fast allocation path

• GCs have a different reclamation pattern• Reclaim objects in bulk• Free-space management is optimized for that

• Also applies to static analysis techniques• They can prove that an object can be safely deallocated• …but there is no mechanism to do the deallocation!

Page 42: Garbage Collection Mythbusters

4242

How to deallocate?

• How can we deallocate the dead object when we only maintain top?

top

end

dead object

free spaceused space

Page 43: Garbage Collection Mythbusters

4343

Myth 3:

GC with explicit deallocation would drastically improve performance.

Page 44: Garbage Collection Mythbusters

4444

Myth 3:

GC with explicit deallocation would drastically improve performance.

Busted!

Page 45: Garbage Collection Mythbusters

4545

Myth 4:

Finalizers can (and should) be called as soon as objects become unreachable.

Page 46: Garbage Collection Mythbusters

4646

Finalizers

• Typical use of Finalizers:• Reclaim external resources associated with objects in heap

• e.g., native GUI components (windows, color maps, etc.)

• Finalizers are called on objects that GC has found to be garbage

• Tracing GC• Does not always have liveness information for every object• Liveness information up to date only at certain points

• Immediately after a tracing cycle• Must finish a tracing cycle to find finalizable objects

Page 47: Garbage Collection Mythbusters

4747

Finalization Reality Check

• Finalizers are not like C++ destructors• No guarantees

• When they will be run• Which thread will run them• Which will run first, second, … last• Or that they will be run at all!

• If you want prompt external resource reclamation• Don't rely on finalizers• Dispose explicity instead• Use finalization as a safety net

Page 48: Garbage Collection Mythbusters

4949

Myth 4:

Finalizers can (and should) be called as soon as objects become unreachable.

Page 49: Garbage Collection Mythbusters

5050

Myth 4:

Finalizers can (and should) be called as soon as objects become unreachable.

Busted!

Page 50: Garbage Collection Mythbusters

5151

Myth 5:

Garbage collection eliminates all memory leaks

Page 51: Garbage Collection Mythbusters

5252

Unused Reachable Objects• Consider the following code:

class ImageMap { private Map<File, Image> map; public void add(File file, Image img) { map.put(file, img); } public Image get(File file){ return map.get(file); } public void remove(File file) { map.remove(file); }}static ImageMap imageMap;…File f = new File(imageFileName);Image img = readImage(f);imageMap.add(f, img);f = null;

Page 52: Garbage Collection Mythbusters

5353

GC and Memory Leaks

• Consider the (f, img) tuple in the previous example• After we null f, the tuple is unused

• We cannot retrieve it (don't have the key any more)• We cannot remove it (don't have the key any more)

• So (f, img) will take up space while imageMap is alive• … without the application being able to access it

• GC reclaims unreachable objects• But not unused objects that are reachable• And it cannot know when a reachable object is unused

Page 53: Garbage Collection Mythbusters

5454

GC and Memory Leaks (ii)

• Effort required to track down such leaks• Can't override malloc anymore

• Tools are needed to help• Heap population statistics (what is being retained?)• Reachability information (why is it being retained?)

Page 54: Garbage Collection Mythbusters

5555

Myth 5:

Garbage collection eliminates all memory leaks.

Page 55: Garbage Collection Mythbusters

5656

Myth 5:

Garbage collection eliminates all memory leaks.

Busted!

Page 56: Garbage Collection Mythbusters

5757

Myth 6:

I can get a GC that delivers very high throughput and very low latency.

Page 57: Garbage Collection Mythbusters

5858

Throughput vs. Latency

• For most applications, GC overhead is small• 2% – 5%

• Throughput GCs• Move most work to GC pauses• Application threads do as little as possible• Least overall GC overhead

• Low-latency GCs• Move work out of GC pauses• Application threads do more work

• Bookkeeping for GC more expensive• More overall GC overhead

Page 58: Garbage Collection Mythbusters

5959

Throughput vs. Latency (ii)

• Goals are conflicting• GCs are architected differently• One GC does not rule them all• Must choose the best GC for the job

• Also consider another dimension:• Footprint

• Why can't the VM choose the right GC?• Impossible to know application priorities• Hints may help

• …but for now, human must decide

Page 59: Garbage Collection Mythbusters

6060

Myth 6:

I can get a GC that delivers very high throughput and very low latency.

Page 60: Garbage Collection Mythbusters

6161

Myth 6:

I can get a GC that delivers very high throughput and very low latency.

Busted!

Page 61: Garbage Collection Mythbusters

6262

Myth 7:

I need to disable GC in criticalsections of my code.

Page 62: Garbage Collection Mythbusters

6363

Why Disable GC?

• Application has a critical deadline• Display a video frame• Complete a stock trade• Adjust nuclear reactor control rods

• GC pause may cause deadline to be missed• Jittery video, missed profit, boom!

• So simply ...• Disable GC• Run without interruptions• Viola! Meet your deadline

Page 63: Garbage Collection Mythbusters

6464

Coding Without GC

• GC typically occurs because heap is full / nearly full• No GC → no allocation• Code in critical section cannot allocate safely

• Possible solution: Allocate in advance• Only access pre-allocated objects• Prohibit allocations during the critical section

• How? Throw exception? Code analysis?

• Must know exactly what data is needed• Before entering critical section• Must audit every change to critical section

Page 64: Garbage Collection Mythbusters

6565

Using Libraries

• Libraries freely allocate objects• And with good reason

• Clear programming model• Lack of side-effects good for concurrency

• Can you always avoid using them in critical sections?• Concatenate two strings• Use the concurrency libraries• Add a new element to a collection• etc.

Page 65: Garbage Collection Mythbusters

6666

A Few More Problems

• Other threads• Cannot allocate either• Stop them all?

• What if they hold a lock you need? → deadlock• Overlapping critical sections

• Can never do GC!!!

• Abuse• Long / unpredictable critical sections

• blocking I/O,waiting for a lock, etc.• Libraries that have critical sections

• We have seen many libraries that call System.gc()• -XX:+DisableCriticalSections?

Page 66: Garbage Collection Mythbusters

6767

The Bottom Line

• It might work in very few, limited cases• Not a general-purpose solution

• Too many ways to shoot yourself in the foot

• You should really consider looking at the RTSJ• Real-Time Specification for Java• But it also has a lot of the same problems we just described

Page 67: Garbage Collection Mythbusters

6868

Myth 7:

I need to disable GC in criticalsections of my code.

Page 68: Garbage Collection Mythbusters

6969

Myth 7:

I need to disable GC in criticalsections of my code.

Busted!

Page 69: Garbage Collection Mythbusters

7070

Myth 8:

GC settings that worked for my last app will also work for my next app.

Page 70: Garbage Collection Mythbusters

7171

What Affects GC Performance?

• Application behavior• Allocation rate

• Higher allocation rate → more frequent GCs• Live data size

• More live data → longer tracing cycles• Mutation rate

• Higher mutation rate → more load on the write barriers, hence more load on incremental GCs

• Hardware• Number of cores, clock rate, total RAM, cache sizes

• GC tuning parameters

Page 71: Garbage Collection Mythbusters

7272

What Affects GC Performance? (ii)

• Primary factors• App behavior, hardware, tuning parameters

• Keep those factors constant• GC should have consistent performance

• Change any of them …• Examples:

• Increase object lifetimes → increase GC time and/or frequency

• Increase average object size → increase copying costs• Move to faster hardware → allocate more objects

• GC performance will change, too

Page 72: Garbage Collection Mythbusters

7373

GC Tuning Parameters

• Yet, we often see ...• Customers move tuning parameters from one app to another

• Transferring parameters• Mixed results at best• Sometimes it works

• Maybe when defaults are really bad!• Usually it doesn't

• Mostly luck• Performance often left on the table

• If applications are very similar• e.g., version 2 of the same app• Use previous tuning parameters only as a starting point

Page 73: Garbage Collection Mythbusters

7474

Guaranteed GC Performance?

• In theory: yes• Real-Time GCs are available• But they require strict bounds on application characteristics

• Modern applications• Very large, very complex, very dynamic• Virtually impossible to analyze them

• At least, to get realistic bounds• At best, approximations based on testing

• Realistically: no hard real-time guarantees• For non-trivial applications• Soft real-time at best…

Page 74: Garbage Collection Mythbusters

7575

Myth 8:

GC settings that worked for my last app will also work for my next app.

Page 75: Garbage Collection Mythbusters

7676

Myth 8:

GC settings that worked for my last app will also work for my next app.

Busted!

Page 76: Garbage Collection Mythbusters

7777

Myth 9:

Page 77: Garbage Collection Mythbusters

7878

Myth 9:

This talk is over.

Page 78: Garbage Collection Mythbusters

7979

Myth 9:

This talk is over.

Confirmed!

Page 79: Garbage Collection Mythbusters

8080

The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Page 80: Garbage Collection Mythbusters