Hypertable An Open Source, High Performance, Scalable Database

37
Hypertable Hypertable Doug Judd Doug Judd Zvents, Inc. Zvents, Inc.

Transcript of Hypertable An Open Source, High Performance, Scalable Database

Page 1: Hypertable An Open Source, High Performance, Scalable Database

HypertableHypertableDoug JuddDoug Judd

Zvents, Inc.Zvents, Inc.

Page 2: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

BackgroundBackground

Page 3: Hypertable An Open Source, High Performance, Scalable Database

Web 2.0 = Data ExplosionWeb 2.0 = Data Explosion

Web 1.0 Web 2.0

Web 1.0Web 2.0

Page 4: Hypertable An Open Source, High Performance, Scalable Database

Traditional ToolsTraditional ToolsDon’t Scale WellDon’t Scale Well

Designed for a single machineDesigned for a single machine Typical scaling solutionsTypical scaling solutions

ad-hocad-hoc manual/static resource allocationmanual/static resource allocation

Page 5: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

The Google StackThe Google Stack

Google File System (GFS)Google File System (GFS) Map-reduceMap-reduce BigtableBigtable

Page 6: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

Architectural OverviewArchitectural Overview

Page 7: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

What is Hypertable?What is Hypertable?

A open source high performance, scalable A open source high performance, scalable database, modelled after Google's Bigtabledatabase, modelled after Google's Bigtable

Not relationalNot relational Does not support transactionsDoes not support transactions

Page 8: Hypertable An Open Source, High Performance, Scalable Database

Hypertable Improvements Hypertable Improvements Over Traditional RDBMSOver Traditional RDBMS

Scalable Scalable High High randomrandom insert, update, and delete insert, update, and delete

raterate

Page 9: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

Data ModelData Model

Sparse, two-dimensional table with cell versionsSparse, two-dimensional table with cell versions Cells are identified by a 4-part keyCells are identified by a 4-part key

RowRow Column FamilyColumn Family Column QualifierColumn Qualifier TimestampTimestamp

Page 10: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

Table: Visual RepresentationTable: Visual Representation

Page 11: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

Table: Actual RepresentationTable: Actual Representation

Page 12: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

Anatomy of a KeyAnatomy of a Key

Row key is \0 terminatedRow key is \0 terminated Column Family is represented with 1 byteColumn Family is represented with 1 byte Column qualifier is \0 terminatedColumn qualifier is \0 terminated Timestamp is stored big-endian ones-complimentTimestamp is stored big-endian ones-compliment

Page 13: Hypertable An Open Source, High Performance, Scalable Database

ConcurrencyConcurrency

Bigtable uses copy-on-writeBigtable uses copy-on-write Hypertable uses a form of MVCCHypertable uses a form of MVCC

(multi-version concurrency control)(multi-version concurrency control) Deletes are carried out by inserting “delete” Deletes are carried out by inserting “delete”

records records

Page 14: Hypertable An Open Source, High Performance, Scalable Database

CellStoreCellStore

Sequence of 65K Sequence of 65K blocks of compressed blocks of compressed key/value pairskey/value pairs

Page 15: Hypertable An Open Source, High Performance, Scalable Database

System OverviewSystem Overview

Page 16: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

Range ServerRange Server

Manages ranges of table dataManages ranges of table data Caches updates in memory (CellCache)Caches updates in memory (CellCache) Periodically spills (compacts) cached updates to disk Periodically spills (compacts) cached updates to disk

(CellStore)(CellStore)

Page 17: Hypertable An Open Source, High Performance, Scalable Database

Client APIClient APIclass Client {

void create_table(const String &name, const String &schema);

Table *open_table(const String &name);

String get_schema(const String &name);

void get_tables(vector<String> &tables);

void drop_table(const String &name, bool if_exists);};

Page 18: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

Client API (cont.)Client API (cont.)class Table {

TableMutator *create_mutator();

TableScanner *create_scanner(ScanSpec &scan_spec);

};

class TableMutator {

void set(KeySpec &key, const void *value, int value_len);

void set_delete(KeySpec &key);

void flush();

};

class TableScanner {

bool next(CellT &cell);

};

Page 19: Hypertable An Open Source, High Performance, Scalable Database

Language BindingsLanguage Bindings

Currently C++ onlyCurrently C++ only Thrift BrokerThrift Broker

Page 20: Hypertable An Open Source, High Performance, Scalable Database

Write Ahead Commit LogWrite Ahead Commit Log

Persists all modifications (inserts and Persists all modifications (inserts and deletes)deletes)

Written into underlying DFSWritten into underlying DFS

Page 21: Hypertable An Open Source, High Performance, Scalable Database

Range Meta-Operation LogRange Meta-Operation Log

Facilitates Range meta operationFacilitates Range meta operation LoadsLoads SplitsSplits MovesMoves

Part of Master and RangeServerPart of Master and RangeServer Ensures Range state and location Ensures Range state and location

consistencyconsistency

Page 22: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

CompressionCompression

Cell Stores store compressed blocks of key/value Cell Stores store compressed blocks of key/value pairspairs

Commit Log stores compressed blocks of Commit Log stores compressed blocks of updatesupdates

Supported Compression SchemesSupported Compression Schemes zlib (--best and --fast)zlib (--best and --fast) lzolzo quicklzquicklz bmzbmz nonenone

Page 23: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

CachingCaching

Block CacheBlock Cache Caches CellStore blocksCaches CellStore blocks Blocks are cached uncompressedBlocks are cached uncompressed

Query CacheQuery Cache Caches query resultsCaches query results TBDTBD

Page 24: Hypertable An Open Source, High Performance, Scalable Database

Bloom FilterBloom Filter

Negative CacheNegative Cache Probabilistic data structureProbabilistic data structure Indicates if key is Indicates if key is notnot present present

Page 25: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

Scaling (part I)Scaling (part I)

Page 26: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

Scaling (part II)Scaling (part II)

Page 27: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

Scaling (part III)Scaling (part III)

Page 28: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

Access GroupsAccess Groups

Provides control of physical data layout -- Provides control of physical data layout -- hybrid row/column orientedhybrid row/column oriented

Improves performance by minimizing I/OImproves performance by minimizing I/O

CREATE TABLE crawldb {CREATE TABLE crawldb { Title MAX_VERSIONS=3, Title MAX_VERSIONS=3, Content MAX_VERSIONS=3, Content MAX_VERSIONS=3, PageRank MAX_VERSIONS=10, PageRank MAX_VERSIONS=10, ClickRank MAX_VERSIONS=10, ClickRank MAX_VERSIONS=10, ACCESS GROUP default (Title, Content), ACCESS GROUP default (Title, Content), ACCESS GROUP ranking (PageRank, ClickRank) ACCESS GROUP ranking (PageRank, ClickRank)}; };

Page 29: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

Filesystem Broker Filesystem Broker ArchitectureArchitecture

Hypertable can run on top of any distributed Hypertable can run on top of any distributed filesystem (e.g. Hadoop, KFS, etc.)filesystem (e.g. Hadoop, KFS, etc.)

Page 30: Hypertable An Open Source, High Performance, Scalable Database

Keys To PerformanceKeys To Performance

C++C++ Asynchronous communicationAsynchronous communication

Page 31: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

C++ vs. JavaC++ vs. Java

Hypertable is CPU intensiveHypertable is CPU intensive Manages large in-memory key/value mapManages large in-memory key/value map Alternate compression codecs (e.g. BMZ)Alternate compression codecs (e.g. BMZ)

Hypertable is memory intensiveHypertable is memory intensive Java uses 2-3 times the amount of memory to Java uses 2-3 times the amount of memory to

manage large in-memory map (e.g. TreeMap)manage large in-memory map (e.g. TreeMap) Poor processor cache performancePoor processor cache performance

Page 32: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

Performance TestPerformance Test(AOL Query Logs)(AOL Query Logs)

75,274,825 inserted cells75,274,825 inserted cells 8 node cluster8 node cluster

1 1.8 GHz Dual-core Opteron1 1.8 GHz Dual-core Opteron 4 GB RAM4 GB RAM 3 x 7200 RPM SATA drives3 x 7200 RPM SATA drives

Average row key: 7 bytesAverage row key: 7 bytes Average value: 15 bytesAverage value: 15 bytes Replication factor: 3Replication factor: 3 4 simultaneous insert clients4 simultaneous insert clients 500K 500K randomrandom inserts/s inserts/s 680K scanned cells/s680K scanned cells/s

Page 33: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

Performance Test IIPerformance Test II Simulated AOL query log dataSimulated AOL query log data 1TB data1TB data 9 node cluster9 node cluster

1 2.33 GHz quad-core Intel1 2.33 GHz quad-core Intel 16 GB RAM16 GB RAM 3 x 7200 RPM SATA drives3 x 7200 RPM SATA drives

Average row key: 9 bytesAverage row key: 9 bytes Average value: 18 bytesAverage value: 18 bytes Replication factor: 3Replication factor: 3 4 simultaneous insert clients4 simultaneous insert clients Over 1M Over 1M randomrandom inserts/s (sustained) inserts/s (sustained)

Page 34: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

WeaknessesWeaknesses

Range data managed by a single range Range data managed by a single range serverserver Though no data loss, can cause periods of Though no data loss, can cause periods of

unavailabilityunavailability Can be mitigated with client-side cache or Can be mitigated with client-side cache or

memcachedmemcached

Page 35: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

Project StatusProject Status

Currently in “alpha”Currently in “alpha” Just released version 0.9.0.7Just released version 0.9.0.7

Will release “beta” version end of AugustWill release “beta” version end of August Waiting on Hadoop JIRA 1700Waiting on Hadoop JIRA 1700

Page 36: Hypertable An Open Source, High Performance, Scalable Database

LicenseLicense

GPL 2.0GPL 2.0 Why not Apache?Why not Apache?

Page 37: Hypertable An Open Source, High Performance, Scalable Database

hypertable.orghypertable.org

Questions?Questions?

www.hypertable.orgwww.hypertable.org