BigTable And Hbase

11
BigTable & Hbase A Distributed Storage System for Structured Data Edward J. Yoon [email protected]

description

A architecture of BigTable/Hbase

Transcript of BigTable And Hbase

Page 1: BigTable And Hbase

BigTable & Hbase A Distributed Storage System for Structured Data

Edward J. [email protected]

Page 2: BigTable And Hbase

Three Major Component

• Master Server- Responsible for assigning tablets to tablet servers, detecting the addition and

expiration of tablet servers, balancing tablet-server load, and garbage collection of files in HDFS

- Handles schema changed such as table and CF creations

• Tablet Server- Manages a set tablets(10~1000 per tablet server)

- Handles read write requests to the tablets

- Splits tablets that have grown too large (100-200 MB)

• Client Library- Communicate directly with tablet servers for reads and writes

Page 3: BigTable And Hbase

Architecture

Tablet Server Tablet Server Tablet Server

GFS chunk server, CMS

client

GFS chunk server, CMS

client

GFS chunk server, CMS

client

Master Server

GFS master server, CMS

client

Chubby

Client

- Handle master election- Store the bootstrap location of Hbase data- Discover region server- Store access control lists

- Assigning tablets- Detecting the addition and expiration of tablet- Balancing tablet-server load- Handle schema changed

CMS server

- Scheduling jobs- Managing resources on the cluster- dealing with machine failures

Page 4: BigTable And Hbase

Data Model

• Doesn’t support a full relational data model

• Multi-dimensional sorted map

• Indexed by a row, column, timestamp

• Column-oriented storage - Most queries only involve a few columns out of many, so greatly reduces I/O.

stringtimestringcolumnstringrow )64int:,:,:(

Page 5: BigTable And Hbase

Tablet Location

• Use three-level hierarchy analogous to that of a B+ tree

- Location is of relevant server

- 1st level: Bootstrapped from lock server, points to location of root tablet

- 2nd level: Uses META 0 data to find owner of appropriate META 1 tablet

- 3rd level: META1 table holds locations of tablets of all other tables

portip :

Page 6: BigTable And Hbase

Tablet Assignment

Master keeps track of the set of live tablet servers the current assignment of tablets to region servers, including which tablets are unassigned.

Chubby

Region Server Master Server

6) Check lock status

5) Assign tablets

Tablet servers

9) Reassign unassigned tablets

2) Create a lock

3) Acquire the lock

4) Monitor

8) Acquire andDelete the lock

Cluster manager

1) Start a server

Page 7: BigTable And Hbase

Tablet Serving

• To recover a tablet

- reads its metadata from the METADATA table

- metadata contains

- the list of SS-Tables that comprise a tablet

- a set of a redo points, which are pointers into any commit logs that may contain data for the tablet.

- reads the indices of the SSTables into memory

- reconstructs the memtable by applying all of the updates that have committed since the redo points

Page 8: BigTable And Hbase

DFSMemory

Compaction

memtable Read op

Write op

Tablet log

Frozenmemtable

V6.0

V4.0 V3.0 V2.0 V1.0

V5.0Create new memtable

Minor compactionMemtable -> a new

SSTable

SSTable files

Deleted data are removed Storage can be re-used

Major compaction

Memtable + all SSTables-> to one SSTable

Merging compactionMemtable + a few SSTables-> A new SSTable

Periodically done.Deleted data are still alive.

Page 9: BigTable And Hbase

Compression

• Clients can control whether or not SSTables for a locality group are compressed

• Tow-pass custom compression scheme- First-pass: long common strings across a large window (BMDiff)

- Second-pass: looks for repetitions in a small 16KB window (zippy)

- Both compression passes are very fast

- Space reduction

• Allow to identify large amounts of shared boilerplate in pages from same host- Choose their row names so that similar data ends up clustered and therefore achieve very good performance

Page 10: BigTable And Hbase

Caching for read performance

• Use two level of caching to improve read performance

• Scan cache- Higher-level cache

- Most useful for applications that tend to read the same data

• Block cache- Lower-level cache

- Useful for applications or random read of different columns in same locality group within a hot row

Page 11: BigTable And Hbase

Hbase : BigTable clone project

• http://hadoop.apache.org/hbase/

• Written in java

• we do not have chubby or a CMS server, we have Job-Tracker and zookeeper coming soon.

• Since Hadoop (GFS) doesn't provide file-append function, Current Hbase have a problem of data loss when Hbase crashed.

- Hadoop 0.19.x provides file append function