Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.
-
Upload
olivia-wheeler -
Category
Documents
-
view
221 -
download
2
Transcript of Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.
HypertableHypertableDoug JuddDoug Judd
Zvents, Inc.Zvents, Inc.
hypertable.orghypertable.org
BackgroundBackground
Web 2.0 = Data ExplosionWeb 2.0 = Data Explosion
Web 1.0 Web 2.0
Web 1.0Web 2.0
Traditional ToolsTraditional ToolsDon’t Scale WellDon’t Scale Well
Designed for a single machineDesigned for a single machine Typical scaling solutionsTypical scaling solutions
ad-hocad-hoc manual/static resource allocationmanual/static resource allocation
hypertable.orghypertable.org
The Google StackThe Google Stack
Google File System (GFS)Google File System (GFS) Map-reduceMap-reduce BigtableBigtable
hypertable.orghypertable.org
Architectural OverviewArchitectural Overview
hypertable.orghypertable.org
What is Hypertable?What is Hypertable?
A open source high performance, scalable A open source high performance, scalable database, modelled after Google's Bigtabledatabase, modelled after Google's Bigtable
Not relationalNot relational Does not support transactionsDoes not support transactions
Hypertable Improvements Hypertable Improvements Over Traditional RDBMSOver Traditional RDBMS
Scalable Scalable High High randomrandom insert, update, and delete insert, update, and delete
raterate
hypertable.orghypertable.org
Data ModelData Model
Sparse, two-dimensional table with cell versionsSparse, two-dimensional table with cell versions Cells are identified by a 4-part keyCells are identified by a 4-part key
RowRow Column FamilyColumn Family Column QualifierColumn Qualifier TimestampTimestamp
hypertable.orghypertable.org
Table: Visual RepresentationTable: Visual Representation
hypertable.orghypertable.org
Table: Actual RepresentationTable: Actual Representation
hypertable.orghypertable.org
Anatomy of a KeyAnatomy of a Key
Row key is \0 terminatedRow key is \0 terminated Column Family is represented with 1 byteColumn Family is represented with 1 byte Column qualifier is \0 terminatedColumn qualifier is \0 terminated Timestamp is stored big-endian ones-complimentTimestamp is stored big-endian ones-compliment
ConcurrencyConcurrency
Bigtable uses copy-on-writeBigtable uses copy-on-write Hypertable uses a form of MVCCHypertable uses a form of MVCC
(multi-version concurrency control)(multi-version concurrency control) Deletes are carried out by inserting “delete” Deletes are carried out by inserting “delete”
records records
CellStoreCellStore
Sequence of 65K Sequence of 65K blocks of compressed blocks of compressed key/value pairskey/value pairs
System OverviewSystem Overview
hypertable.orghypertable.org
Range ServerRange Server
Manages ranges of table dataManages ranges of table data Caches updates in memory (CellCache)Caches updates in memory (CellCache) Periodically spills (compacts) cached updates to disk Periodically spills (compacts) cached updates to disk
(CellStore)(CellStore)
Client APIClient APIclass Client {
void create_table(const String &name, const String &schema);
Table *open_table(const String &name);
String get_schema(const String &name);
void get_tables(vector<String> &tables);
void drop_table(const String &name, bool if_exists);};
hypertable.orghypertable.org
Client API (cont.)Client API (cont.)class Table {
TableMutator *create_mutator();
TableScanner *create_scanner(ScanSpec &scan_spec);
};
class TableMutator {
void set(KeySpec &key, const void *value, int value_len);
void set_delete(KeySpec &key);
void flush();
};
class TableScanner {
bool next(CellT &cell);
};
Language BindingsLanguage Bindings
Currently C++ onlyCurrently C++ only Thrift BrokerThrift Broker
Write Ahead Commit LogWrite Ahead Commit Log
Persists all modifications (inserts and Persists all modifications (inserts and deletes)deletes)
Written into underlying DFSWritten into underlying DFS
Range Meta-Operation LogRange Meta-Operation Log
Facilitates Range meta operationFacilitates Range meta operation LoadsLoads SplitsSplits MovesMoves
Part of Master and RangeServerPart of Master and RangeServer Ensures Range state and location Ensures Range state and location
consistencyconsistency
hypertable.orghypertable.org
CompressionCompression
Cell Stores store compressed blocks of key/value Cell Stores store compressed blocks of key/value pairspairs
Commit Log stores compressed blocks of Commit Log stores compressed blocks of updatesupdates
Supported Compression SchemesSupported Compression Schemes zlib (--best and --fast)zlib (--best and --fast) lzolzo quicklzquicklz bmzbmz nonenone
hypertable.orghypertable.org
CachingCaching
Block CacheBlock Cache Caches CellStore blocksCaches CellStore blocks Blocks are cached uncompressedBlocks are cached uncompressed
Query CacheQuery Cache Caches query resultsCaches query results TBDTBD
Bloom FilterBloom Filter
Negative CacheNegative Cache Probabilistic data structureProbabilistic data structure Indicates if key is Indicates if key is notnot present present
hypertable.orghypertable.org
Scaling (part I)Scaling (part I)
hypertable.orghypertable.org
Scaling (part II)Scaling (part II)
hypertable.orghypertable.org
Scaling (part III)Scaling (part III)
hypertable.orghypertable.org
Access GroupsAccess Groups
Provides control of physical data layout -- Provides control of physical data layout -- hybrid row/column orientedhybrid row/column oriented
Improves performance by minimizing I/OImproves performance by minimizing I/O
CREATE TABLE crawldb {CREATE TABLE crawldb { Title MAX_VERSIONS=3, Title MAX_VERSIONS=3, Content MAX_VERSIONS=3, Content MAX_VERSIONS=3, PageRank MAX_VERSIONS=10, PageRank MAX_VERSIONS=10, ClickRank MAX_VERSIONS=10, ClickRank MAX_VERSIONS=10, ACCESS GROUP default (Title, Content), ACCESS GROUP default (Title, Content), ACCESS GROUP ranking (PageRank, ClickRank) ACCESS GROUP ranking (PageRank, ClickRank)}; };
hypertable.orghypertable.org
Filesystem Broker Filesystem Broker ArchitectureArchitecture
Hypertable can run on top of any distributed Hypertable can run on top of any distributed filesystem (e.g. Hadoop, KFS, etc.)filesystem (e.g. Hadoop, KFS, etc.)
Keys To PerformanceKeys To Performance
C++C++ Asynchronous communicationAsynchronous communication
hypertable.orghypertable.org
C++ vs. JavaC++ vs. Java
Hypertable is CPU intensiveHypertable is CPU intensive Manages large in-memory key/value mapManages large in-memory key/value map Alternate compression codecs (e.g. BMZ)Alternate compression codecs (e.g. BMZ)
Hypertable is memory intensiveHypertable is memory intensive Java uses 2-3 times the amount of memory to Java uses 2-3 times the amount of memory to
manage large in-memory map (e.g. TreeMap)manage large in-memory map (e.g. TreeMap) Poor processor cache performancePoor processor cache performance
hypertable.orghypertable.org
Performance TestPerformance Test(AOL Query Logs)(AOL Query Logs)
75,274,825 inserted cells75,274,825 inserted cells 8 node cluster8 node cluster
1 1.8 GHz Dual-core Opteron1 1.8 GHz Dual-core Opteron 4 GB RAM4 GB RAM 3 x 7200 RPM SATA drives3 x 7200 RPM SATA drives
Average row key: 7 bytesAverage row key: 7 bytes Average value: 15 bytesAverage value: 15 bytes Replication factor: 3Replication factor: 3 4 simultaneous insert clients4 simultaneous insert clients 500K 500K randomrandom inserts/s inserts/s 680K scanned cells/s680K scanned cells/s
hypertable.orghypertable.org
Performance Test IIPerformance Test II Simulated AOL query log dataSimulated AOL query log data 1TB data1TB data 9 node cluster9 node cluster
1 2.33 GHz quad-core Intel1 2.33 GHz quad-core Intel 16 GB RAM16 GB RAM 3 x 7200 RPM SATA drives3 x 7200 RPM SATA drives
Average row key: 9 bytesAverage row key: 9 bytes Average value: 18 bytesAverage value: 18 bytes Replication factor: 3Replication factor: 3 4 simultaneous insert clients4 simultaneous insert clients Over 1M Over 1M randomrandom inserts/s (sustained) inserts/s (sustained)
hypertable.orghypertable.org
WeaknessesWeaknesses
Range data managed by a single range Range data managed by a single range serverserver Though no data loss, can cause periods of Though no data loss, can cause periods of
unavailabilityunavailability Can be mitigated with client-side cache or Can be mitigated with client-side cache or
memcachedmemcached
hypertable.orghypertable.org
Project StatusProject Status
Currently in “alpha”Currently in “alpha” Just released version 0.9.0.7Just released version 0.9.0.7
Will release “beta” version end of AugustWill release “beta” version end of August Waiting on Hadoop JIRA 1700Waiting on Hadoop JIRA 1700
LicenseLicense
GPL 2.0GPL 2.0 Why not Apache?Why not Apache?
hypertable.orghypertable.org
Questions?Questions?
www.hypertable.orgwww.hypertable.org