HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on...

38
HBase Read Path [email protected]

Transcript of HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on...

Page 1: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

HBase Read [email protected]

Page 2: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

Abstract❏ Client Side❏ Server Side❏ Tuning

Page 3: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

Part-1 Client Side

Page 4: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

HBase Client ClientScanner

ClientScanner

cache(queue)scanner.next()RegionServer-0

RegionServer-1

RegionServer-2scanResultCache

ScannerCallableWithReplicas

1. RPC Request

2. RPC Response

3. Regroup and enqueue

4. Return a result

Step.1 + Step.2 + Step.3 = loadCache

Page 5: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

Concepts in Scan● caching● batch● maxResultSize● allowPartialResults● limit● maxVersion● needCursorResult● filter● isolationLevel● asyncPrefetch

Page 6: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

caching

caching

maxResultSize

AllowPartialScanResultCache

Cell-1 Cell-2RowKey-1 Cell-3

Cell-4RowKey-2

Cell-5RowKey-3

Cell-7RowKey-4

Cell-6

Cell-8 Cell-9

Cell-1Result-1

Cell-4Result-3

Cell-5Result-4

Cell-7Result-5

Cell-6

Cell-8

Cell-2Result-2 Cell-3

Cell-9

Cell-1Result-1

Cell-4Result-3

Cell-5Result-4

Cell-7Result-5

Cell-6

Cell-8

Cell-2Result-2 Cell-3

RegionServer Row Data RPC Response Recieved from RS Results get from scanner.next()

Cell-9

Size of cell-1 > 1MB

scan.setCaching(2).setAllowPartialResults(true).setMaxResultSize(1MB)

One load cache loop

maxResultSize One load cache loop break because of maxResultSize limit

Caching One load cache loop break because of caching limit

Page 7: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

Caching

maxResultSize

CompleteScanResultCache

Cell-1 Cell-2RowKey-1 Cell-3

Cell-4RowKey-2

Cell-5RowKey-3

Cell-7RowKey-4

Cell-6

Cell-8 Cell-9

Cell-1Result-1

Cell-4Result-3

Cell-5Result-4

Cell-7Result-5

Cell-6

Cell-8

Cell-2Result-2 Cell-3

RegionServer Row Data RPC Response Recieved from RS Results get from scanner.next()

Cell-1 Cell-2Result-1 Cell-3

Cell-4Result-2

Cell-5Result-3

Cell-7Result-4

Cell-6

Cell-8 Cell-9

Cell-9

Size of cell-1 > 1MB

scan.setCaching(2).setMaxResultSize(1MB)

One load cache loop

maxResultSize One load cache loop break because of maxResultSize limit

Caching One load cache loop break because of caching limit

Page 8: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

Caching

Caching

Caching

BatchScanResultCache

Cell-1 Cell-2RowKey-1 Cell-3

Cell-4RowKey-2

Cell-5RowKey-3

Cell-7RowKey-4

Cell-6

Cell-8 Cell-9

Cell-1Result-1

Cell-4Result-3

Cell-5Result-4

Cell-7Result-5

Cell-6

Cell-8

Cell-2Result-2 Cell-3

Cell-9

RegionServer Row Data RPC Response Recieved from RS Results get from scanner.next()

Cell-1Result-1

Cell-4Result-3

Cell-5Result-4

Cell-7Result-5

Cell-6

Cell-8

Result-2 Cell-3

Result-6 Cell-9

Cell-2Size of cell-1 > 1MB

scan.setCaching(2).setBatch(2).setMaxResultSize(1MB)

One load cache loop

maxResultSize One load cache loop break because of maxResultSize limit

Caching One load cache loop break because of caching limit

Page 9: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

Part-2 Server Side

Page 10: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)
Page 11: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

Abstract In Server Side❏ High Efficiency❏ Query Diversity❏ Server Resource Protection

Page 12: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

KeyValue / Cell

keyLen(4B) valueLen(4B) rowkeyLen(2B) rowkeyBytes familyLen(1B) familyBytes qualifierBytes timestamp(8B) type(1B) valueBytes

Rowkey Family Qualifier Timestamp Type

KeyValue’s Key Part

KeyValue’s Value Part

Page 13: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

K-MergeSort

1311752

54321

1512963

1

1311752

54321

1512963

12

1311752

54321

1512963

1

22

A={N1, N2, ..., Nk} , So the complexity is ?

Page 14: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

KeyValueHeap

KeyValueHeap

RegionScanner

RegionScanner

StoreScanner (cf1)

StoreFileScanner

MemStoreScanner

StoreScanner (cf2)

MemStoreScanner

StoreFileScanner StoreFileScanner

KeyValueHeap

StoreFileScanner

MemStoreScanner MemStoreScanner

StoreFileScanner StoreFileScanner

StoreFileScanner

StoreScanner (cf3)

KeyValueHeapMemStoreScanner

MemStoreScanner

Page 15: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

RegionScanner

KeyValueHeap

KeyValueHeap

RegionScanner

StoreFileScanner

MemStoreScanner

StoreScanner

MemStoreScanner

StoreFileScanner StoreFileScanner

….

Page 16: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

RegionScanner

KeyValueHeap

KeyValueHeap

RegionScanner

StoreFileScanner

MemStoreScanner

StoreScanner

MemStoreScanner

StoreFileScanner StoreFileScanner

….

Page 17: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

Revisit ConceptsBloomFilter

MSLAB

scan.setTimeRange

BucketCache

scan.setBatch

scan.setMaxVersion

scan.setRaw scan.setStartRow

scan.setStopRow

scan.setLimit scan.setReversed

scan.setFilter

HeartBeat

scan.setNeedCursorResult scan.setMaxResultSizescan.Quota

Column Qualifer

Family TTL

ScannerLeasescan.setCacheBlock

Page 18: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

RegionServer

Revisit Concepts

KeyValueHeap

KeyValueHeap

RegionScanner

StoreFileScanner

MemStoreScanner

StoreScanner

MemStoreScanner

StoreFileScanner StoreFileScanner

….

scan.Quota scan.setMaxResultSize

HeartBeat scan.setNeedCursorResult

scan.setStopRowscan.setStartRow

Filter/FilterList

scan.setTimeRange scan.setMaxVersion

BucketCache

MSLAB

BloomFilter

Filter/FilterList

scan.setRaw Column Qualifer Family TTL

scan.setReversed

scan.setLimit

ScannerLease

scan.setCacheBlock

Page 19: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

Bloom Filter

1 : w may exist in the set.

f(w)=

0 : w does definitely not exist in the set.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Page 20: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

LruCache + onheap

Before GC (G1)

After GC (G1)

Memory allocated on Java heap

Free memory on Java heap

Page 21: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

BucketCache

Bucket-1(2MB)

BucketCache

4KB

4KB

4KB

4KB

…..

Bucket-2(2MB)

8KB

8KB

8KB

8KB

…..

Bucket-3(2MB)

512KB

512KB

512KB

512KB

…..

Block-1 (7KB) Block-2 (500KB)

…..

Block-4 (3KB) Block-5 (3KB)

Page 22: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

BucketCache + OnHeap

Before GC (G1)

bucket-4KB (2MB) bucket-8KB (2MB) bucket-16KB (2MB) bucket-32KB (2MB)

After GC (G1)

bucket-16KB (2MB) bucket-8KB (2MB) bucket-4KB (2MB) bucket-32KB (2MB)

Memory allocated on Java heap and occupied by cache block

Memory allocated on Java heap and with no data.

● Less fragment(Allocate 2MB one time) , Less Mixed GC.● JVM will never free BucketCache’s byte buffer.● But JVM will still sweep the buffer and compact them. (old generation)

Page 23: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

BucketCache + OffHeap

Before GC (G1)

bucket-4KB (2MB) bucket-8KB (2MB) bucket-16KB (2MB) bucket-32KB (2MB)

Memory allocated on Java heap and occupied by cache block

Memory allocated on Java heap and with no data.

After GC (G1)

bucket-4KB (2MB) bucket-8KB (2MB) bucket-16KB (2MB) bucket-32KB (2MB)

● JVM will NOT sweep the cache and compacting them (old generation)● Less mixed GC(s) and shorter STW time.

Page 24: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

End-to-end offheap on the read-path (HBASE-11425)

BucketCache

StoreFileScanner

Copy the Block from BucketCache(offheap) to onheap.

Rpc Handler

offheap

onheap

BucketCache

StoreFileScanner

Return a DirectByteBuffer(off, len) which reference to the offheap block.

Rpc Handler

offheap

onheap

Use the DirectByteBuffer in all the read path.

branch-1.5 branch-2

● No need to copy block from offheap to onheap.● Less onheap occupied.● Less mixed gc and shorter stw time.

Page 25: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

Abstract In Server Side❏ High Efficiency❏ Query Diversity - TODO❏ Server Resource Protection

Page 26: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

Abstract In Server Side❏ High Efficiency❏ Query Diversity❏ Server Resource Protection

Page 27: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

HFile

Response Result

HFile

Block-2

Block-1

Two Cases

Put-1

Delete-1

Delete-2

Delete-999

….

….

Block-100Put-2000

Block-2

Block-1Put-1

….

….

Block-100Put-2000

Put-2

Put-3

Put-999

Put-1

Put-2000

Response Result

Put-1

Put-2000

scan.setFilter(new SingleColumnValueFilter(...))Scan the HFile with so many deletes

Read too many block data into memory, which may cause GC or OOM

Page 28: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

Server Side Limit● Data Size / Heap Size

○ MaxResultSize

● Timeout○ HeartBeat: abort this rpc once timeout and just return the current results to client.○ Cursor: return a fake result with the current rowkey for next rpc once timeout.

● Batch○ RS will still accumulate multiple results until reach max result size even if reach batch limit○ Related issue: HBASE-21206

● BlockSize ?

Page 29: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

Part-3 Tuning

Page 30: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

Tuning● Read Distribution● Locality● Short Circuit Read● CacheHitRatio● StoreFileCount● ReadRawCells / ResponseCells● Java GC● Scan Table OR Snapshot

Page 31: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

Rows(10^11) + onheap(12g)/offHeap(12g) + Balanced + Locality(1.0) + MajorCompaction + disabledShortCircuitRead

Page 32: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

Rows(10^11) + onheap(12g)/offHeap(12g) + Balanced + Locality(1.0) + MajorCompaction + enabledShortCircuitRead

Page 33: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

Short Circuit Read● Disable Short Circuit Read

○ P99~120ms, but P999~1s.○ Serious impact on P999.○ One DN has 128 socket backlog, easy to happen “slow tcp connection”

● Enable Short Circuit Read○ P99~100ms, P999~250ms.○ Both QPS and latency are more stable.

Page 34: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

ReadRawCells / ResponseCells

398039155 / 30077 = 13234

Page 35: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

ReadRawCells / ResponseCells

Page 36: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

ReadRawCells / ResponseCells

4905492 / 27692 = 177

398039155 / 30077 = 13234

Page 37: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

ReadRawCells / ResponseCells● Solution

○ Scan the offline cluster’s snapshot directly.○ After the scan, we can get a rowkey subset .○ For each rowkey in subset, checkAndPut the online cluster.

Page 38: HBase Read Path - GitHub Pagesopeninx.github.io/ppt/hbase-read-path.pdf · End-to-end offheap on the read-path (HBASE-11425) BucketCache StoreFileScanner Copy the Block from BucketCache(offheap)

Thank You