«Большие объёмы данных и сборка мусора в Java

64
Big JVM and garbage Big JVM and garbage collection collection Alexey Ragozin [email protected] Sep 2012

description

Алексей Рагозин, Technical Lead, Caching and Data Grid Services, VP, Risk and PnL, Deutsche Bank

Transcript of «Большие объёмы данных и сборка мусора в Java

Page 1: «Большие объёмы данных и сборка мусора в Java

Big JVM and garbage collectionBig JVM and garbage collection

Alexey [email protected]

Sep 2012

Page 2: «Большие объёмы данных и сборка мусора в Java

What it is all about?What it is all about?

• Automatic memory management, how it works

• Why JVM need Stop-the-World pauses• Tuning GC in HotSpot JVM

Page 3: «Большие объёмы данных и сборка мусора в Java

Automatic memory managementAutomatic memory management

Languages with automatic memory management Java, JavaScript, Erlang, Haskell, Python, PHP, C#,

Ruby, Perl, SmallTalk, OCaml, List, Scala, ML, Go, D, … … and countingLangauges without automatic memory managment C, C++, Pascal/Delphi, Objective-C Anything else, anyone?

Page 4: «Большие объёмы данных и сборка мусора в Java

How to manage memory?How to manage memory?

Garbage – data structure (object) in memory unreachable for the program.

How to find garbage? Reference counting Object graph traversal Do not collect garbage at all

Page 5: «Большие объёмы данных и сборка мусора в Java

Reference countingReference counting

+ Simple+ No Stop-the-World pauses required– Cannot collect cyclic graphs– 15-30% CPU overhead– Pretty bad for multi core systems

Page 6: «Большие объёмы данных и сборка мусора в Java

Object graph traversalObject graph traversal

• RootsStatic fieldsLocal variables (stack frames)

• Reachable objects - alive• Unreachable objects - garbageIn general, graph should not be mutated during graph

traversal. As a consequence, application should be frozen for period of while runtime is collecting garbage.

Page 7: «Большие объёмы данных и сборка мусора в Java

Garbage collectionGarbage collection

Copy collection Traverse object graph and copy reachable object to other

space Mark old space as free

Mark / Sweep Traverse object graph and mark reachable objects Scan (sweep) whole memory and “free” unmarked objects

Mark / Sweep / Compact … mark … sweep …. Relocate live objects to defragment free space

AlgorithmsAlgorithms

Page 8: «Большие объёмы данных и сборка мусора в Java

Garbage collectionGarbage collection

S – whole heap sizeL – size of live objects

Copy collection Throughput Mark / Sweep Throughput

EconomicsEconomics

L

LSc

S

LSc

L

LSc

21

For all algorithms based on reference reachability. GC efficiency is in reverse proportion to amount of

live objects.

For all algorithms based on reference reachability. GC efficiency is in reverse proportion to amount of

live objects.

Total amount of garbage

Page 9: «Большие объёмы данных и сборка мусора в Java

Garbage collectionGarbage collectionGenerational approachGenerational approach

Dea

th r

ate

(byt

e/se

c)

Age

Young GC period

Old GC period

Heap demography

Page 10: «Большие объёмы данных и сборка мусора в Java

WHAT DO WE HAVE IN TOOL BOX?WHAT DO WE HAVE IN TOOL BOX?

Page 11: «Большие объёмы данных и сборка мусора в Java

Garbage collectionGarbage collection

Stop-the-world (STW) pause – pause of all application threads require

Compacting algorithms – can move objects in memory to defragment free space

Parallel collection – using multiple cores to reduce STW time

Concurrent collection – collection algorithms working in parallel with application threads

Terms dictionaryTerms dictionary

Page 12: «Большие объёмы данных и сборка мусора в Java

Garbage collectionGarbage collection

Throughput algorithms– minimize total time of program execution– economically efficient CPU utilization

Low pause algorithms – minimize time of individual STW pause– may use background (concurrent) collection– may incremental collection

Throughput vs low latencyThroughput vs low latency

Page 13: «Большие объёмы данных и сборка мусора в Java

Oracle HotSpot JVMOracle HotSpot JVM

Throughput algorithmsParallel GC (-XX:+UseParallelOldGC)Young: Copy collector Old: Parallel Mark Sweep Compact

Low pause algorithms Concurrent Mark Sweet (-XX:+UseConcMarkSweepGC)Young: Copy collector Old: Mark Sweep – not compacting (prone for fragmentation)– most work is in background– young collections are STW

Page 14: «Большие объёмы данных и сборка мусора в Java

Oracle HotSpot JVMOracle HotSpot JVM

Low pause algorithms Garbage First – G1 (-XX:+UseG1GC)Young: Copy collector Old: Incremental copy collector– incremental – more STW but shorter– collect regions with more garbage first– compacting, but had problems with large objects

G1 – algorithm of future, hopefully not forever– bad throughput– pauses normally are twice longer than CMS

Page 15: «Большие объёмы данных и сборка мусора в Java

Garbage collectionGarbage collectionGenerational approachGenerational approach

Young space collection High throughput Low memory utilization

Promotion Eden (nursery) -> Survivor (keep) space -> Old space

Old space collection Better memory utilization Orders of magnitude lower throughput

Memory barrier JVM “tracks” references from old to young space

Page 16: «Большие объёмы данных и сборка мусора в Java

Oracle’s HotSpot JVMOracle’s HotSpot JVM

Default (serial) collector Young: Serial copy collector, Old: serial MSC

Parallel scavenge / Parallel old GC Young: Parallel copy collector, Old: serial MSC or parallel MSC

Concurrent mark sweep (CMS) Young: Serial or parallel copy collector, Old: concurrent mark

sweep

G1 (garbage first) Young: Copy collector (region based) Old: Incremental MSC

http://blog.ragozin.info/2011/07/hotspot-jvm-garbage-collection-options.html

Page 17: «Большие объёмы данных и сборка мусора в Java

Oracle’s HotSpot JVMOracle’s HotSpot JVM

Young collector Old collector JVM option Serial (DefNew) Serial Mark-Sweep-Compact -XX:+UseSerialGC

Parallel scavenge (PSYoungGen) Serial Mark-Sweep-Compact (PSOldGen) -XX:+UseParallelGC

Parallel scavenge (PSYoungGen) Parallel Mark-Sweep-Compact (ParOldGen) -XX:+UseParallelOldGC

Serial (DefNew) Concurrent Mark Sweep -XX:+UseConcMarkSweepGC -XX:-UseParNewGC

Parallel (ParNew) Concurrent Mark Sweep -XX:+UseConcMarkSweepGC -XX:+UseParNewGC

G1 -XX:+UseG1GC

http://blog.ragozin.info/2011/09/hotspot-jvm-garbage-collection-options.html

Page 18: «Большие объёмы данных и сборка мусора в Java

Oracle’s Jrockit JVMOracle’s Jrockit JVM

-Xgc: option Generational Mark Sweep/Compactgenconcon or gencon Yes concurrent incrementalsingleconcon or singlecon No concurrent incremental

genconpar Yes concurrent parallelsingleconpar No concurrent parallelgenparpar or genpar Yes parallel parallelsingleparpar or singlepar No parallel parallel

genparcon Yes parallel incrementalsingleparcon No parallel incremental

http://blog.ragozin.info/2011/07/jrockit-gc-in-action.html

Page 19: «Большие объёмы данных и сборка мусора в Java

Azul Zing JVMAzul Zing JVM

• Generational GC• Young – Concurrent mark sweep compact

MSC) • Old – Concurrent mark sweep compact MSC)Azul Zing can relocate objects in memory

without STW pause. Secret – read barrier (барьер чтения).Requires special linux kernel modules to run

Page 20: «Большие объёмы данных и сборка мусора в Java

JVM HEAP SIZE AND PAUSESJVM HEAP SIZE AND PAUSES

Page 21: «Большие объёмы данных и сборка мусора в Java

Concurrent Mark SweepConcurrent Mark Sweep

Initial mark - Stop-The-World Collect root references (thread stacks) – mark them gray Mark them as gray

Concurrent mark - concurrent Do three color marking until grays exhaust Mark all black objects on dirty regions as gray (by card table) Repeat

Remark - Stop-The-World Final remark

Sweep - concurrent Scan heap and reclaim white objects

Page 22: «Большие объёмы данных и сборка мусора в Java

Cost structure of pauses (CMS)Cost structure of pauses (CMS)Summary of pausesSummary of pauses

Page 23: «Большие объёмы данных и сборка мусора в Java

MOVING OUT OF HEAPMOVING OUT OF HEAP

Page 24: «Большие объёмы данных и сборка мусора в Java

Direct memory buffersDirect memory buffers

java.nio.ByteBuffer.allocateDirect()

Pro• Memory is allocated out of heap• Memory is deallocated when ByteBuffer is collected• Cross platform, native java

Con• Fragmentation of non-heap memory• Memory is deallocated when ByteBuffer is collected• Complicated multi thread programming• -XX:MaxDirectMemorySize=<value>

Page 25: «Большие объёмы данных и сборка мусора в Java

RTSJRTSJ

Scoped memory• Objects can be allocated in chosen memory

areas• Scoped and immortal areas are not garbage

collected• Scoped areas can be release by whole area• Cross references between areas are limited

and this limitation is enforced in run time

Page 26: «Большие объёмы данных и сборка мусора в Java

Unsafe javaUnsafe java

sun.misc.Unsafe• Unsafe.allocateMemory(…)• Unsafe.reallocateMemory(…)• Unsafe.freeMemory(…)

Page 27: «Большие объёмы данных и сборка мусора в Java

Thank you

Alexey Ragozin [email protected]

http://blog.ragozin.info- my articles

Page 28: «Большие объёмы данных и сборка мусора в Java

YOUNG COLLECTIONYOUNG COLLECTION

Page 29: «Большие объёмы данных и сборка мусора в Java

Memory spaces in HotSpot JVMMemory spaces in HotSpot JVM

Memory geometry• Young space: -XX:NewSize=<n> -XX:MaxNewSize=<n>• Survival space: Young space / -XX:SurvivorRatio=<n>• Young + old space: -Xms<n> -Xmx<n>• Permanent space: -XX:PermSize=<n> -XX:MaxPerSize=<n>

Eden Survivor 1 Survivor 2 Tenured

Permanent

* G1 has same set of spaces but they are not continuous address ranges but dynamic sets of regions

Page 30: «Большие объёмы данных и сборка мусора в Java

How young collection works?How young collection works?

Collect root references Stack frame references References from other spaces (tenured + permanent) does it mean scanning old space?

Travers object graph Visit only live object Copy live object to other region of young space or old space

Consider whole Eden and old survivor space to be free memory

Write barrier is required to effectively collect references from old to young space.

Write barrier is required to effectively collect references from old to young space.

Page 31: «Большие объёмы данных и сборка мусора в Java

How young collector worksHow young collector works

Eden S1 S2 Tenured

Dirty cards

Collect roots for young GCScan stack tracesScan dirty pages in old space

Collecting root referencesCollecting root references

Card marking barrierEach 512 bytes of heap is associated with flag (card).Once reference is written in memory, associated card is marked dirty.

Card marking barrierEach 512 bytes of heap is associated with flag (card).Once reference is written in memory, associated card is marked dirty.

Page 32: «Большие объёмы данных и сборка мусора в Java

How young collection works?How young collection works?

Eden S1 S2 Tenured

Dirty cards

Collect roots for young GCClean cardsRecursive copy of live objects (only live objects are traversed)

Coping live objectsCoping live objects

Card table is reset just before copy collector starts to move objects.Card table is reset just before copy collector starts to move objects.

Page 33: «Большие объёмы данных и сборка мусора в Java

How young collection works?How young collection works?

Eden S1 S2 Tenured

Dirty cards

Collection finishedCollection finished

Since every object in young space has been relocated, clean card means that there is no references to young space in particular 512 bytes of heap.

Since every object in young space has been relocated, clean card means that there is no references to young space in particular 512 bytes of heap.

Page 34: «Большие объёмы данных и сборка мусора в Java

Thread local allocation blocksThread local allocation blocks

• Each thread preallocates block in Eden• Thread is allocating new objects in its TLAB• Then TLAB is full, new TLAB allocated• If object does not fit TLAB

• Allocate in Eden space• If does not fit Eden (or ‑XX:PretenureSizeThreshold)

• Allocate in old space

TLA in HotSpot JVMTLA in HotSpot JVM

Page 35: «Большие объёмы данных и сборка мусора в Java

Young collection stop-the-worldYoung collection stop-the-world

Total STW time Collect roots

Scan thread stacks Scan dirty cards

Read card table ~ Sheap Scan pages marked as dirty ~

Copy live objects Process special references

heapSC

1

* You can use -XX:+PrintGCTaskTimeStamps to analyze time of individual phases

* You can use -XX:+PrintReferenceGC to analyze reference processing times

Page 36: «Большие объёмы данных и сборка мусора в Java

OLD SPACE COLLECTIONOLD SPACE COLLECTION

Page 37: «Большие объёмы данных и сборка мусора в Java

HotSpot: Old space collectionHotSpot: Old space collection

Stop-the-World Mark-Sweep-Compact Single threaded Multithreaded

Concurrent Mark Sweep (CMS) Background collection of old space

G1 (Garbage Fisrt) Incremental Stop-the-Wolrd collection

Page 38: «Большие объёмы данных и сборка мусора в Java

HotSpot: Old space collectionHotSpot: Old space collection

HotSpot’s CMS (Concurrent Mark Sweep)• Does not compact• Prone to fragmentation• Use separate free lists for each object size• Use statistic to manage fragmentation• Introduces 2 short STW phases

Concurrent Mark SweepConcurrent Mark Sweep

Page 39: «Большие объёмы данных и сборка мусора в Java

HotSpot: Old space collectionHotSpot: Old space collection

HotSpot’s G1• Space is divided into regions• Regions can be collected individually• Write barrier tracks references between regions• Subset of regions collected during STW pause

Live object are “evacuated” to other regions

• Young collections – all Eden regions collected• Partial collection – few old regions collected• Global marking is used to estimated live population

Incremental collectionIncremental collection

Page 40: «Большие объёмы данных и сборка мусора в Java

Concurrent Mark SweepConcurrent Mark Sweep

roots

Three color markingThree color marking

Page 41: «Большие объёмы данных и сборка мусора в Java

Concurrent Mark SweepConcurrent Mark Sweep

roots

Three color markingThree color marking

Page 42: «Большие объёмы данных и сборка мусора в Java

Concurrent Mark SweepConcurrent Mark Sweep

roots

roots

Three color markingThree color marking

Page 43: «Большие объёмы данных и сборка мусора в Java

Concurrent Mark SweepConcurrent Mark Sweep

roots

roots

Three color markingThree color marking

Page 44: «Большие объёмы данных и сборка мусора в Java

Concurrent Marking ArtifactsConcurrent Marking Artifacts

A B C D

GC A

C

SATB barrier exampleSATB barrier example

Page 45: «Большие объёмы данных и сборка мусора в Java

Concurrent Marking ArtifactsConcurrent Marking Artifacts

A B C D

GC C

D

SATB barrier exampleSATB barrier example

Page 46: «Большие объёмы данных и сборка мусора в Java

Concurrent Marking ArtifactsConcurrent Marking Artifacts

A B C D

GC C

D

Reference queue: B D

SATB barrier exampleSATB barrier example

Page 47: «Большие объёмы данных и сборка мусора в Java

Concurrent Marking ArtifactsConcurrent Marking Artifacts

A B C D

GC C

D

SATB barrier exampleSATB barrier example

Page 48: «Большие объёмы данных и сборка мусора в Java

Concurrent Marking ArtifactsConcurrent Marking Artifacts

A B C D

GC D

D

SATB barrier exampleSATB barrier example

Page 49: «Большие объёмы данных и сборка мусора в Java

Concurrent Marking ArtifactsConcurrent Marking Artifacts

A B C D

GC

Reference queue: B D

SATB barrier exampleSATB barrier example

Page 50: «Большие объёмы данных и сборка мусора в Java

Concurrent Marking ArtifactsConcurrent Marking Artifacts

A B C D

GC

Reference queue:

B

D

empty

SATB barrier exampleSATB barrier example

Page 51: «Большие объёмы данных и сборка мусора в Java

Concurrent Mark SweepConcurrent Mark Sweep

Initial mark - Stop-The-World Collect root references (thread stacks) – mark them gray Mark them as gray

Concurrent mark - concurrent Do three color marking until grays exhaust Mark all black objects on dirty regions as gray (by card table) Repeat

Remark - Stop-The-World Final remark

Sweep - concurrent Scan heap and reclaim white objects

Page 52: «Большие объёмы данных и сборка мусора в Java

Cost structure of pauses (CMS)Cost structure of pauses (CMS)Summary of pausesSummary of pauses

Page 53: «Большие объёмы данных и сборка мусора в Java

Patching OpenJDKPatching OpenJDKSerial collector gainSerial collector gain

http://aragozin.blogspot.com/2011/07/openjdk-patch-cutting-down-gc-pause.html

Page 54: «Большие объёмы данных и сборка мусора в Java

Patching OpenJDKPatching OpenJDKCMS collector gainCMS collector gain

http://aragozin.blogspot.com/2011/07/openjdk-patch-cutting-down-gc-pause.html

Page 55: «Большие объёмы данных и сборка мусора в Java

Concurrent Mark SweepConcurrent Mark Sweep

Concurrent mode failureIf background collection cannot free memory fast enough. CMS

will perform Stop-The-World single thread Mark-Sweep-Compact.

Promotion failureDue to fragmentation. Old space may not have continuous block

of memory to accommodate promoted object even if free space is available.

CMS will perform Stop-The-World single thread Mark-Sweep-Compact to defragment memory.

Full GCFull GC

Page 56: «Большие объёмы данных и сборка мусора в Java

TUNING TROUBLESHOTINGTUNING TROUBLESHOTING

Page 57: «Большие объёмы данных и сборка мусора в Java

Common reasons for long STWCommon reasons for long STW

[Times: user=0.53 sys=0.06, real=0.15 secs]

• Full GC• OS Swapping• Too many survivors in young space• Long reference processing• JNI delays• Long CMS initial mark / remark

Page 58: «Большие объёмы данных и сборка мусора в Java

CMS Check listCMS Check list

• jdk6u22 - jdk6u26 – broken free lists logic• -XX:CMSWaitDuration=…• -XX:+CMSScavengeBeforeRemark=…• -XX:-CMSConcurrentMTEnabled• Consider CMS for permanent space• Size your heap -Xmn / -Xms / -Xmx

Expected data + young space + CMS overhead CMS overhead ~30% of expected data

Page 59: «Большие объёмы данных и сборка мусора в Java

Tuning young collectionTuning young collection

Eden size too small – frequent YGC, objects promoted to old space early too large – more long lived objects need to be copied

Survivor space size too small – overflow, objects prematurely promoted too large – memory wasted

Tenuring threshold higher – objects are kept in young space for longer higher – more objects in young space, more copy time

Page 60: «Большие объёмы данных и сборка мусора в Java

Tuning young collectionTuning young collection

Eden size -XX:MaxNewSize=<n> -XX:NewSize=<n> Eden size = new size – 2 * survivor space size

Survivor space size -XX:SurvivorRatio=<n> Survivor space size = new size / survivor ratio

Tenuring threshold -XX:MaxTenuringThreshold=<n>

Page 61: «Большие объёмы данных и сборка мусора в Java

Tuning young collectionTuning young collection

Small heap sizes Balance tenuring threshold / survivor space to keep objects in

limited young space for longer

Large heap sizes (4Gb and greater) Limit tenuring threshold to avoid increase in copy time Limit survivor space to avoid accidental long young collections Increase Eden size instead of increasing tenuring threshold

Page 62: «Большие объёмы данных и сборка мусора в Java

Tuning young collectionTuning young collection

GC tuning is based on application allocation pattern

If application allocation patterns is changed – you are in trouble

In practice application always have different “modes of operation”

GC tuning – choosing better evil

Page 63: «Большие объёмы данных и сборка мусора в Java

DiagnosticsDiagnostics

Page 64: «Большие объёмы данных и сборка мусора в Java

Surviving with huge heapSurviving with huge heap

• CMS is very good in terms of pauses You can reliably keep pauses under 150ms – 50ms

on 30GiB – 50 GiB

• Fragmentation treat Not big deal for server type of applications XML processing is GC disaster

• Very narrow GC comfort zone If you tune for “long run” you are likely to have

pauses during initial loads / bulk refreshes