Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø...

28
© Copyright Azul Systems 2015 © Copyright Azul Systems 2015 @azulsystems azulsystems.com Concurrent Garbage Collection § Deepak Sreedhar § JVM engineer, Azul Systems 7/27/15 1 Java User Group Bangalore

Transcript of Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø...

Page 1: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

© Copyright Azul Systems 2015

@azulsystems azulsystems.com

Concurrent Garbage Collection

§ Deepak Sreedhar§ JVM engineer, Azul Systems

7/27/151

Java User GroupBangalore

Page 2: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

About me: Deepak SreedharØ JVM student at Azul Systems

Ø Currently working on enhancing the C4 garbage collector implementation in Azul Zing JVM

Ø Prior experience with dynamic binary translation and server migration tools

7/27/152

Page 3: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Introduction

7/27/153

Page 4: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Quiz

Ø Does java spec mandate automatic GC? Ø Is GC efficient?Ø Can GC collect all dead objects? Ø Can GC impact application throughput? Ø Can GC impact application latency?Ø Does a larger heap imply poorer performance? Ø Does increasing Xmx (more free space) Improve

GC efficiency?

7/27/154

Page 5: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Terminology

Ø The java heap memoryØ Objects and referencesØ Live, reachable and dead objectsØ Fragmentation and headroom wastageØ Virtual and physical memoryØ MutatorsØ Allocation and mutation rates

7/27/155

Page 6: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

GC SafepointØ A point in thread execution when GC can identify all

references correctly, and there is no mutationØ Global safepoint (STW) – all threads are at safepointØ Safepointing not same as halting. A thread running

native code (JNI) is at a safepoint Ø Time to safepoint is as crucial for low latency as is

the GC operation time. Try -XX:+PrintGCApplicationStoppedTime

Ø Safepoints may be needed for non GC reasons such as deoptimization and JVMTI heap iteration

7/27/156

Page 7: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

GC classification

Ø Precise vs. ConservativeØ Incremental vs. MonolithicØ Parallel vs. SerialØ Concurrent vs. Stop-the-world Ø Multi-generational collectors

• Weak generational hypothesis• Young (new) and Old (tenured) generation• Promotion (tenuring)• Lesser pauses usually in new gen (smaller set of live objects)• Remembered sets, card tables for cross-generational

references• Can delay, but not avoid old gen collections

7/27/157

Page 8: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Copying collector

Ø Copy and fixup as objects are discoveredØ “From” and “To” spacesØ Used for young (new) gen in many collectorsØ Usually implemented as monolithic, stop-the-worldØ Complexity of the order of live objectsØ Theoretically, requires double the memoryØ Practically many objects may be dead

• Eden and survivor spaces• Early promotion to old gen when more memory is needed

7/27/158

Page 9: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Mark Compact

Ø Separate mark and compact phasesØ Mark (trace) - identify live objectsØ Compact - Move objects to reduce fragmentation

Ø Compact to “To” spaceØ Complexity of the order of live objectsØ Can be implemented incrementallyØ Full compaction can be delayed

7/27/159

Page 10: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Mark Sweep Compact

Ø Mark - identify live objectsØ Sweep – iterate over the heap and find free spaceØ Compact - Move objects to reduce fragmentationØ Used for old gen in many collectorsØ Complexity of the order of heap sizeØ In-place, does not need more memoryØ Can be implemented incrementallyØ Can delay compaction to reduce pauses, but not

eliminate it

7/27/1510

Page 11: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Object allocationØ Increasing memory availability on servers – into the

terabyte space Ø Efficient allocation using Thread Local Allocation Buffers

(TLAB) and simple “advance the top” algorithmØ Not many java applications able to fully utilize this facilityØ GC pauses (including in new gen)Ø Difficulty in arriving at the right tuningØ Object pools, off heap memory used to get around this

problem – not perfect solutions since memory management layer needs to be coded

Ø Can we have a continuously concurrent garbage collector?

7/27/1511

Page 12: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Challenges and approaches

7/27/1512

Page 13: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Concurrent MarkingØ Marking – start from roots and traverse the object

graph through discovered referencesØ Mutators can modify the object graph while GC is

marking • Move a reference to an already visited portion of the graph• Remove references to an object from heap and keep a single

reference in a register hiding it from GC marker

Ø Approaches• Incremental update – revisit root-set and modified portions of the

graph iteratively, end with a re-mark pause• SATB (snapshot at the beginning) – intercept writes and store old

contents into buffers

7/27/1513

Page 14: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Concurrent CompactionØ Mutators can modify an object while it is being copiedØ Mutators can read an object using stale pointers after it has

been copiedØ Incremental compact - G1GC Approach

• Divide heap into regions, maintain inter region references using remembered sets

• Minor collections use a copying collector• Some minor collections do incremental compaction for old gen• After concurrent mark, estimate efficiency of collecting regions, those with no

or smaller RSets can be collected easier, so will be prioritized for upcoming minor collections

• Source regions updated while copying, RSets updates on new regions follow copying

• Mark sweep compact for STW major collections

ØRead Barriers

7/27/1514

Page 15: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

GC Barriers

Ø Instructions executed by mutators that aid gar bage collection

Ø Help maintain metadataØ Impose invariantsØ Write barriers

• Update cross generation or cross region references• SATB barrier to ensure snapshot is fully marked• Incremental update barriers that store new references

Ø Read barriers• Baker-style barrier• Brooks-style forwarding pointer• C4 Load Value Barrier

7/27/1515

Page 16: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

The Continuously Concurrent Compacting

Collector (C4)

7/27/1516

Page 17: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Loaded Value Barrier

Ø A read barrier that ensures, at time of load, that the following invariants are met before reference is visible to application

• If GC cycle is in marking phase, the reference will be marked through

• If GC cycle is in relocation phase, or has completed relocation but not fixup, the reference will be updated to point to the relocated object

Ø Simultaneously guarantees that• No reference misses GC attention during marking• There is no stale access to a compacted page

Ø The result of the load will always be a valid reference to a valid object

7/27/1517

Page 18: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Self Healing

Ø Contents of source location overwritten with the result of LVB

Ø Loading from same source cannot trigger barrier again

Ø Critical property that ensures finite and predictable amount of work

Ø There may be “trap storms” at phase shifts, but they will settle down as we do healing and complete

Ø Unique to the C4 barrier (LVB)

7/27/1518

Page 19: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Mark phaseØLike other collectors start from root set and traverse

the object graphØNMT (not marked through) LVB check – does

reference metadata match expected GC state for the generation?

ØTrap handling – Fix NMT state for the reference, heal the source location and add to collector’s work queue

ØCheckpoints to clean stacks and transfer ref buffersØMarking followed by a concurrent weak reference

processing phase

7/27/1519

Page 20: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Relocation phaseØ Forwarding information kept outside of heap pagesØ Virtual memory of compacted pages remain reserved until fixup is

completeØ Physical memory can be released immediately (Quick Release) and

recycledØ Hand over hand relocation – Each GC thread can complete with just

one seed pageØ Compacted pages are protected to catch accesses performed without

LVBØ Mutators cooperate in the relocation if GC hasn’t moved the object yet

at the relocate LVB trapØ Also heal the source memory with the new address of the objectØ Large objects are just remapped to new virtual addresses, not

physically copied

7/27/1520

Page 21: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Fixup phaseØTraverse object graph and heal memory

locations if not already done by mutatorsØAt end of fixup phase, virtual memory

corresponding to compacted pages can be freed

ØCan be combined with marking phase for next GC cycle, helping reduce GC cycle duration

ØMutators will do the fixup as part of LVB

7/27/1521

Page 22: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Generational features

Ø New and old collections can proceed simultaneously and almost independently, unlike most collectors

Ø Perm gen processed by Old collectorØ Old and new collectors use the same algorithmØ Synchronization using simple interlocks and limited

suspension at phase changesØ Precise card marks for inter generational

references. Updated by Store Value Barriers (SVB)Ø Can be extended to N generations

7/27/1522

Page 23: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Heap managementØAllocation in 2 MB “pages” ØQuick Release allows physical pages to be recycled

to satisfy allocation requests before fixup is completeØNew, old and perm gen pages interleaved in virtual

spaceØTiered allocation - Objects divided into small, mid and

large “spaces” based on size – helps limit maximum headroom wastage (currently 12.5%)

ØTLABs for small space allocation, bump-the-pointerØRelocation uses a different mechanism for each

space to limit the maximum copy that a mutator needs to do

7/27/1523

Page 24: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Zing Safepoints

Ø C4 algorithm is pauseless, but current implementation has few short pauses mostly at collector phase transitions (for ease and efficiency)

Ø Pause times independent of heap size, live object size, object lifetime, allocation rate, mutation rate, count of weak/soft/phantom references

Ø Provides sufficient safepoint opportunities to reduce time to bring threads to safepoint

Ø Pause times remain consistentØ Employs thread checkpoints when there is a

specific action to be performed for/by that thread or when the thread needs to observe a GC state change

7/27/1524

Page 25: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

More on Zing

Ø GC scheduled by heuristicsØ In most cases no tuning requiredØ Elastic memory - helps reduce occurrences of OOMØ Linux kernel module to improve performance of

virtual memory operations

7/27/1525

Page 26: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Keywords for reference searchØ Talks by Gil Tene, CTO Azul Systems Ø The Garbage Collection HandbookØ C4: The Continuously Concurrent Compacting CollectorØ Garbage-First Garbage CollectionØ Azul Zing JVM

7/27/1526

Page 27: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Where Zing shinesØ Low latency

Eliminate behaviour blips down to the sub-millisecond-units level

Ø Machine-to-machine “stuff”Support higher *sustainable* throughput (one that meets SLAs)

Messaging, queues, market data feeds, fraud detection, analytics

Ø Human response timesEliminate user-annoying response time blips. Multi-second and even fraction-of-a-second blips will be completely gone.

Support larger memory JVMs *if needed* (e.g. larger virtual user counts, or larger cache, in-memory state, or consolidating multiple instances)

Ø “Large” data and in-memory analyticsMake batch stuff “business real time”. Gain super-efficiencies.

Cassandra, Spark, Solr, DataGrid, any large dataset in fast motion7/27/1527

Page 28: Concurrent Garbage Collection - Meetupfiles.meetup.com/3189882/JUG_Concurrent_GC_Jul2015.pdf · Ø Concurrent vs. Stop-the-world ... Ø Object pools, off heap memory used to get around

© Copyright Azul Systems 2015

Q & A

7/27/1528