GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation -...

47
HBase performance improvements GC settings for HBase Obscure HBase features

Transcript of GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation -...

Page 1: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

● HBase performance improvements● GC settings for HBase● Obscure HBase features

Page 2: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

About Me● Lars Hofhansl● Architect at Salesforce● HBase Committer, HBase PMC member● HBase blog: http://hadoop-hbase.blogspot.com● Generalist.

○ have automated oil refineries using Fortran○ real time communications C/C++○ hacked databases and appservers○ implemented Javascript UI frameworks○ mostly a backend plumber, recently mostly Java○ Aikido black belt

Page 3: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Part I - Performance

(some of the following slides are lifted/adapted from a talk my coworker Jamie Martin gave some time back)

Page 4: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Modern HW architecture

Page 5: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Jeff Dean's "Numbers Everyone Should Know"+ some of my own

Page 6: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

● CPU ~3GHZ (.33ns cycle time)● L1 cache reference .5ns● L2 cache reference 7ns● Main memory reference 100ns● SSD seek: 100,000ns● Read 1MB sequentially from memory 250,000ns● Round trip within same datacenter 500,000ns● Disk seek: 10,000,000ns● Read 1 MB sequentially from network 10,000,000ns● Read 1MB sequentially from disk 20,000,000ns● Read 1TB sequentially from disk: hours● Read 1TB randomly from disk: 10s of days

Page 7: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

CPU is idle for ~300 cycles during a memory fetch!

Page 8: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

CPU is idle for ~300 cycles during a memory fetch!

Page 9: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

In 1 disk seek, 1mb can be read from disk over the net.

Page 10: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

In 1 disk seek, 1mb can be read from disk over the net.

Page 11: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

RAM is the new disk, disk is the new tape

Page 12: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

RAM is the new disk, disk is the new tape(tape is the new horse-carriage delivery)

Page 13: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

SSDs change the equation

Page 14: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

SSDs change the equation● IO bounds tasks are now CPU bound● Random IO almost as good as sequential IO● Wear-Out a real concern

Page 15: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Trends:

● CPU speed improves 60% per year, DRAM speed about 10%

● GAP grows 50% year over year● Too much heat beyond 4 GHZ -> multiples

cores, h/w threads

Page 16: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Performance Gap: Memory and disks are getting further and further away from the CPU...

Page 17: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Each CPU core has its own L1/L2 cache.●cores can see changes out of order● reordering limited via memory barriers

Page 18: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Jim Gray: 100x faster cache at 90% hit rate makes effective perf improvement of 10x

Page 19: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Must design with cache coherency/locality in mind●Avoid memory barriers●Avoid unnecessary copies in RAM

(remember RAM is a slow, shared resource now, that does not scale with the cores)

Page 20: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

A memory barrier (or fence) ensures that all prior reads or writes are seen in the same order by all cores. (details are tricky, and I am not a hardware guy)

Page 21: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Memory barriers in Java:● synchronized - sets read and write barriers as needed

(details depend on JVM, version, and settings)● volatile - sets a read barrier before a read to a volatile,

and write barrier after a write● final - sets a write barrier after the assignment● AtomicInteger, AtomicLong, etc - uses volatiles and

hardware CAS instructions● Explicit Locks - same as synchronized in terms of barriers

Page 22: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Memory barriers in Java:● synchronized - sets read and write barriers as needed

(details depend on JVM, version, and settings)● volatile - sets a read barrier before a read to a volatile,

and write barrier after a write● final - sets a write barrier after the assignment● AtomicInteger, AtomicLong, etc - uses volatiles and

hardware CAS instructions● Explicit Locks - same as synchronized in terms of barriers

TL;DR: Avoid these in hot code paths

Page 23: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

But what about HBase?

Page 24: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

HBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

HBASE-6621 - Reduce calls to Bytes.toInt

HBASE-6711 - Avoid local results copy in StoreScanner

HBASE-7180 - RegionScannerImpl.next() is inefficient

HBASE-7279 - Avoid copying the rowkey in RegionScanner, StoreScanner, and

ScanQueryMatcher

HBASE-7336 - HFileBlock.readAtOffset does not work well with multiple threads

HBASE-9915 - Performance: isSeeked() in EncodedScannerV2 always returns false

HBASE-9807 - block encoder unnecessarily copies the key for each reseek

HBASE-10015 - Replace intrinsic locking with explicit locks in StoreScanner

HBASE-10117 - Avoid synchronization in HRegionScannerImpl.isFilterDone

HBASE-6852 - SchemaMetrics.updateOnCacheHit costs too much while full scanning a table

with all of its fields (Cheng Hao)

Page 25: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

2x improvement for cached scans+

2-3x improvement for block encoders

Page 26: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Garbage Collection

Page 27: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often
Page 28: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

“Weak generational hypothesis”

Page 29: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

“Weak generational hypothesis”

Page 30: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

HotSpot manages four generations:

1. Eden for all new objects2. Survivor I and II where surviving objects are promoted when eden

is collected3. Tenured space. Objects surviving a few rounds (16 by default) of

eden/survivor collection are promoted into the tenured space4. Perm gen for classes, interned strings, and other more or less

permanent objects.

Page 31: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

The principle costs to the garbage collector

1. tracing all objects from the "root" objects and2. collecting all unreachable objects.

#1 is expensive when many objects need to be traced, and #2 is expensive when objects have to be moved (for example to reduce memory fragmentation)

Page 32: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Block cache and Memstore live longAllocated in chunks

64k blocks and 2mb slabs

Page 33: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Other objects don’t outlive a single RPC

Page 34: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

-Xmn256m - small eden (maybe -Xmn1024m, but not more than that)-XX:+UseParNewGC - collect eden in parallel-XX:+UseConcMarkSweepGC - use the non-moving CMS collector-XX:CMSInitiatingOccupancyFraction=70 - start collecting when 70% of the tenured gen are full to avoid collection under pressure-XX:+UseCMSInitiatingOccupancyOnly - do not try to adjust CMS setting

TL;DR:

Page 35: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Observations with typical workloads:

Average GC times ~50ms99'ile GC times ~150msworst case <1500ms (very rare)

all on contemporary hardware with heaps of 32GB

Page 36: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

What about G1?

Page 37: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Dunno

Page 38: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Nasty single threaded full GC fallback in 1.7Might be OK in 1.8

Haven’t tried

Page 39: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Part II - Lesser known HBase features

(Or is it just a crayon up the nose?)

Page 40: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

1. KEEP_DELETED_CELLS2. MIN_VERSIONS3. Atomic Row Mutations4. MultiRowMutationEndpoint5. BulkDeleteEndpoint6. Filters and essential column families

Page 41: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

KEEP_DELETED_CELLS

● HBase allows flashback queries○ Ability to get the state of your data as of a time○ Scan.setTimeRange, Scan.setTimeStamp

● This does not work with deletes○ Put(X) at TS○ Put(X') at TS+1○ Scan at TS: X○ Delete(X) at TS+2○ Scan at TS or TS+1: nil

● KEEP_DELETED_CELLS keeps deleted cells until they naturally expire (TTL or number of versions)○ Scan at TS: X○ Scan at TS+1: X'

● Delete marker collected after two major compactions

Page 42: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

MIN_VERSIONS

● HBase supports TTL (compliments of Andy Purtell)○ data in column family is assigned a maximum life time, after

that it is unconditionally deleted.

● What if you wanted to expire your data after a while, but keep at least the last (N) version(s)?○ For example you want to keep 24h of data, but keep at least

one version if no change was made in 24h.

● Enter MIN_VERSIONS.○ Works together with TTL.○ Defines the number of versions to retain even when their TTL

has expired

Page 43: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Atomic Row Mutations

● Atomically apply multiple operation to a row:HTable.mutateRow(RowMutation)

● RowMutations implements the Row interface:List<Row> rows = ...;rows.add(new RowMutation(row1));rows.add(new RowMutation(row2));t.batch(rows);

Page 44: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Multi Row Transactions

● HBase supports multi row transaction if rows are colocated in the same regions

● Colocation is controlled by a SplitPolicy● Transactions are implemented in

MultiRowMutationEndpoint

Page 45: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Filter with essential families

● Filters in HBase are powerful○ Skip scans via hinting

● But need to load entire row to evaluate filter● New feature introduced in 0.94.5: FilterBase.isFamilyEssential(byte[] cfname)○ Filter will only load essential column families.○ In a 2nd pass all column families are loaded as needed

(i.e. if the included the rest of the row)● Currently implemented by SingleColumnValueFilter

Page 46: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Bulk Deletes

● Problem: Delete all column families, or columns, or column versions that are matched by a Scan.

Page 47: GC settings for HBase Obscure HBase features HBase ...files.meetup.com/1350427/HUG presentation - Apple(1).pdfHBASE-6603 - RegionMetricsStorage.incrNumericMetric is called too often

Bulk Deletes

● Problem: Delete all column families, or columns, or column versions that are matched by a Scan.

● BulkDelete coprocessor endpoint shipped with HBase addresses that.BulkDeleteResponse delete(Scan scan, byte deleteType, Long timestamp, int rowBatchSize)