Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

37
Hypertable Hypertable Doug Judd Doug Judd Zvents, Inc. Zvents, Inc.

Transcript of Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

Page 1: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

HypertableHypertableDoug JuddDoug Judd

Zvents, Inc.Zvents, Inc.

Page 2: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

BackgroundBackground

Page 3: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

Web 2.0 = Data ExplosionWeb 2.0 = Data Explosion

Web 1.0 Web 2.0

Web 1.0Web 2.0

Page 4: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

Traditional ToolsTraditional ToolsDon’t Scale WellDon’t Scale Well

Designed for a single machineDesigned for a single machine Typical scaling solutionsTypical scaling solutions

ad-hocad-hoc manual/static resource allocationmanual/static resource allocation

Page 5: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

The Google StackThe Google Stack

Google File System (GFS)Google File System (GFS) Map-reduceMap-reduce BigtableBigtable

Page 6: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

Architectural OverviewArchitectural Overview

Page 7: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

What is Hypertable?What is Hypertable?

A open source high performance, scalable A open source high performance, scalable database, modelled after Google's Bigtabledatabase, modelled after Google's Bigtable

Not relationalNot relational Does not support transactionsDoes not support transactions

Page 8: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

Hypertable Improvements Hypertable Improvements Over Traditional RDBMSOver Traditional RDBMS

Scalable Scalable High High randomrandom insert, update, and delete insert, update, and delete

raterate

Page 9: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

Data ModelData Model

Sparse, two-dimensional table with cell versionsSparse, two-dimensional table with cell versions Cells are identified by a 4-part keyCells are identified by a 4-part key

RowRow Column FamilyColumn Family Column QualifierColumn Qualifier TimestampTimestamp

Page 10: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

Table: Visual RepresentationTable: Visual Representation

Page 11: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

Table: Actual RepresentationTable: Actual Representation

Page 12: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

Anatomy of a KeyAnatomy of a Key

Row key is \0 terminatedRow key is \0 terminated Column Family is represented with 1 byteColumn Family is represented with 1 byte Column qualifier is \0 terminatedColumn qualifier is \0 terminated Timestamp is stored big-endian ones-complimentTimestamp is stored big-endian ones-compliment

Page 13: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

ConcurrencyConcurrency

Bigtable uses copy-on-writeBigtable uses copy-on-write Hypertable uses a form of MVCCHypertable uses a form of MVCC

(multi-version concurrency control)(multi-version concurrency control) Deletes are carried out by inserting “delete” Deletes are carried out by inserting “delete”

records records

Page 14: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

CellStoreCellStore

Sequence of 65K Sequence of 65K blocks of compressed blocks of compressed key/value pairskey/value pairs

Page 15: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

System OverviewSystem Overview

Page 16: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

Range ServerRange Server

Manages ranges of table dataManages ranges of table data Caches updates in memory (CellCache)Caches updates in memory (CellCache) Periodically spills (compacts) cached updates to disk Periodically spills (compacts) cached updates to disk

(CellStore)(CellStore)

Page 17: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

Client APIClient APIclass Client {

void create_table(const String &name, const String &schema);

Table *open_table(const String &name);

String get_schema(const String &name);

void get_tables(vector<String> &tables);

void drop_table(const String &name, bool if_exists);};

Page 18: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

Client API (cont.)Client API (cont.)class Table {

TableMutator *create_mutator();

TableScanner *create_scanner(ScanSpec &scan_spec);

};

class TableMutator {

void set(KeySpec &key, const void *value, int value_len);

void set_delete(KeySpec &key);

void flush();

};

class TableScanner {

bool next(CellT &cell);

};

Page 19: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

Language BindingsLanguage Bindings

Currently C++ onlyCurrently C++ only Thrift BrokerThrift Broker

Page 20: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

Write Ahead Commit LogWrite Ahead Commit Log

Persists all modifications (inserts and Persists all modifications (inserts and deletes)deletes)

Written into underlying DFSWritten into underlying DFS

Page 21: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

Range Meta-Operation LogRange Meta-Operation Log

Facilitates Range meta operationFacilitates Range meta operation LoadsLoads SplitsSplits MovesMoves

Part of Master and RangeServerPart of Master and RangeServer Ensures Range state and location Ensures Range state and location

consistencyconsistency

Page 22: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

CompressionCompression

Cell Stores store compressed blocks of key/value Cell Stores store compressed blocks of key/value pairspairs

Commit Log stores compressed blocks of Commit Log stores compressed blocks of updatesupdates

Supported Compression SchemesSupported Compression Schemes zlib (--best and --fast)zlib (--best and --fast) lzolzo quicklzquicklz bmzbmz nonenone

Page 23: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

CachingCaching

Block CacheBlock Cache Caches CellStore blocksCaches CellStore blocks Blocks are cached uncompressedBlocks are cached uncompressed

Query CacheQuery Cache Caches query resultsCaches query results TBDTBD

Page 24: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

Bloom FilterBloom Filter

Negative CacheNegative Cache Probabilistic data structureProbabilistic data structure Indicates if key is Indicates if key is notnot present present

Page 25: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

Scaling (part I)Scaling (part I)

Page 26: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

Scaling (part II)Scaling (part II)

Page 27: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

Scaling (part III)Scaling (part III)

Page 28: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

Access GroupsAccess Groups

Provides control of physical data layout -- Provides control of physical data layout -- hybrid row/column orientedhybrid row/column oriented

Improves performance by minimizing I/OImproves performance by minimizing I/O

CREATE TABLE crawldb {CREATE TABLE crawldb { Title MAX_VERSIONS=3, Title MAX_VERSIONS=3, Content MAX_VERSIONS=3, Content MAX_VERSIONS=3, PageRank MAX_VERSIONS=10, PageRank MAX_VERSIONS=10, ClickRank MAX_VERSIONS=10, ClickRank MAX_VERSIONS=10, ACCESS GROUP default (Title, Content), ACCESS GROUP default (Title, Content), ACCESS GROUP ranking (PageRank, ClickRank) ACCESS GROUP ranking (PageRank, ClickRank)}; };

Page 29: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

Filesystem Broker Filesystem Broker ArchitectureArchitecture

Hypertable can run on top of any distributed Hypertable can run on top of any distributed filesystem (e.g. Hadoop, KFS, etc.)filesystem (e.g. Hadoop, KFS, etc.)

Page 30: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

Keys To PerformanceKeys To Performance

C++C++ Asynchronous communicationAsynchronous communication

Page 31: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

C++ vs. JavaC++ vs. Java

Hypertable is CPU intensiveHypertable is CPU intensive Manages large in-memory key/value mapManages large in-memory key/value map Alternate compression codecs (e.g. BMZ)Alternate compression codecs (e.g. BMZ)

Hypertable is memory intensiveHypertable is memory intensive Java uses 2-3 times the amount of memory to Java uses 2-3 times the amount of memory to

manage large in-memory map (e.g. TreeMap)manage large in-memory map (e.g. TreeMap) Poor processor cache performancePoor processor cache performance

Page 32: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

Performance TestPerformance Test(AOL Query Logs)(AOL Query Logs)

75,274,825 inserted cells75,274,825 inserted cells 8 node cluster8 node cluster

1 1.8 GHz Dual-core Opteron1 1.8 GHz Dual-core Opteron 4 GB RAM4 GB RAM 3 x 7200 RPM SATA drives3 x 7200 RPM SATA drives

Average row key: 7 bytesAverage row key: 7 bytes Average value: 15 bytesAverage value: 15 bytes Replication factor: 3Replication factor: 3 4 simultaneous insert clients4 simultaneous insert clients 500K 500K randomrandom inserts/s inserts/s 680K scanned cells/s680K scanned cells/s

Page 33: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

Performance Test IIPerformance Test II Simulated AOL query log dataSimulated AOL query log data 1TB data1TB data 9 node cluster9 node cluster

1 2.33 GHz quad-core Intel1 2.33 GHz quad-core Intel 16 GB RAM16 GB RAM 3 x 7200 RPM SATA drives3 x 7200 RPM SATA drives

Average row key: 9 bytesAverage row key: 9 bytes Average value: 18 bytesAverage value: 18 bytes Replication factor: 3Replication factor: 3 4 simultaneous insert clients4 simultaneous insert clients Over 1M Over 1M randomrandom inserts/s (sustained) inserts/s (sustained)

Page 34: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

WeaknessesWeaknesses

Range data managed by a single range Range data managed by a single range serverserver Though no data loss, can cause periods of Though no data loss, can cause periods of

unavailabilityunavailability Can be mitigated with client-side cache or Can be mitigated with client-side cache or

memcachedmemcached

Page 35: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

Project StatusProject Status

Currently in “alpha”Currently in “alpha” Just released version 0.9.0.7Just released version 0.9.0.7

Will release “beta” version end of AugustWill release “beta” version end of August Waiting on Hadoop JIRA 1700Waiting on Hadoop JIRA 1700

Page 36: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

LicenseLicense

GPL 2.0GPL 2.0 Why not Apache?Why not Apache?

Page 37: Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

hypertable.orghypertable.org

Questions?Questions?

www.hypertable.orgwww.hypertable.org