Hyperdex - A closer look

HyperDex A Closer Look

Dr. Stefan Schadwinkel Co-Founder / Analytics Engineer

[email protected]

–  DECK36 is a young spin-off from ICANS

–  Small team of 7 engineers

–  Longstanding expertise in designing, implementing and operating complex web systems

–  Developing own data intelligence-focused tools and web services

–  Offering our expert knowledge in:

–  Automation & Operations

–  Architecture & Engineering

–  Analytics & Data Logistics

BACKGROUND *log: Storm-based Analytics RT

BACKGROUND *log Our *log provides stream-based real-time analytics. We need a serious DB. We need to focus on servicing each request, scale easily & fast, throughput must be consistent, we need secondary indices, and the possibility to compute aggregations. MongoDB, Cassandra, Riak, MariaDB HyperDex: A Distributed, Searchable Key-Value Store. Robert Escriva, Bernard Wong and Emin Gün Sirer. In Proceedings of the SIGCOMM Conference, Helsinki, Finland, August 2012. http://hyperdex.org/papers/hyperdex.pdf

WHY HYPERDEX? Next Generation K/V

WHY HYPERDEX? Features. CAP - Common Buzz: Consistent, Available, Partition-tolerant – Pick any two. From http://hyperdex.org/FAQ/: HyperDex is designed to withstand a threshold of failures desired by the application. The level of fault-tolerance is tunable by the system administrator. HyperDex guarantees consistency, availability in the presence of less than f faults, and partition tolerance for partitions that affect less than f nodes, where f is a user-tunable parameter. -  Fully linearizable. Every ‘get’ always returns the latest ‘put’. -  Tolerates up to f failures. -  Query secondary attributes almost as fast as the primary key. -  Rich data types: Strings, Floats, Ints, Lists, Maps, Sets -  Atomic, multi-key transactions. (Commercial)

HOW? Hyperspace Hashing

HYPERSPACE HASHING Mapping Data into Euclidean Space Each object is mapped into space. Space is mapped onto servers. One hyperspace relates to one table. HyperDex can manage multiple independent hyperspaces.

HYPERSPACE HASHING So far, so good. Aww, wait! The curse of dimensionality. The volume of the resulting hyperspace grows exponentially in the number of dimensions/attributes. For instance, a table with 9 dimensions requires 29

regions. That’s a minimum of 512 servers.

HYPERSPACE HASHING Logarithms to the rescue! Subspaces. HyperDex splits the hyperspace into multiple lower dimensional subspaces. Thus, the volume of the space only grows linearly. Not only does this reduce the number of machines required to store the data, search becomes more efficient, because less machines need to be contacted. A key subspace is added to distinguish key lookup from single attribute searches. Each subspace stores a full copy of the object.

HOW? Value-dependent Chaining

VALUE-DEPENDENT CHAINING Consistency and Replication. We have copies of each object in each subspace. Value-dependent chaining keeps all copies consistent and provides strong consistency (linearizability) and fault tolerance in the presence of concurrent updates.

VALUE-DEPENDENT CHAINING Consistency. HyperDex propagates each update deterministically to all relevant spaces.

Update u1: PUT (insert key) -  h1, h2, h3 Chains are executed from the end. Head = Point leader. The same for each key. The point leader knows all updates. Dependencies are embedded in the chain.

VALUE-DEPENDENT CHAINING Replication. HyperDex inserts replicas for each region into the chain.

Consider Update u1: -  h1, h2, h3 -  h1, h1‘, h2, h2‘, h3, h3‘ -  h1, h1‘, h2‘, h2‘‘, h3, h3‘

Replicas are always updated first. Failures do not compromise strong consistency. Clients are only acknowledged after full replication is achieved.

THE PARTS OF THE MACHINE HyperDex - Nuts and Bolts

THE PARTS OF THE MACHINE The Slave Node. Everything is C++. The slave nodes are not particularly interesting.

THE PARTS OF THE MACHINE The Coordinator & the Configuration. A logically centralized coordinator maintains global state. -  Own replicated state machine for the coordinator called “replicant”. This is what Zookeeper does for Hadoop et al. -  Global state is maintained as Configuration. -  The coordinator has no state of the stored objects, only mappings and servers. -  Instance: IP, Port, Instance ID. -  The coordinator creates new configurations based on changes and failures and distributes it to the client.

THE PARTS OF THE MACHINE The Client. The client is part of the whole system, not just a customer. -  Client receives new configurations from the coordinator. -  Switching to a new configuration is atomic. -  Client only contacts relevant nodes. This is significant for performance.

-  Clients must be “intelligent”. No REST. -  A load-balancing proxy layer could help. But isn’t there.

-  Full support for C++, Python. -  Partial support for Java (uses the C++ driver through JNI), Node.JS, Ruby -  Using layers skips features. Java driver doesn’t support “count”.

THE REAL WORLD™ HyperDex Tutorial

THE REAL WORLD ™ Install. Pre-build packages. Supports CentOS, Debian, Fedora, Ubuntu. But not all versions. And not everything. Read: “No package for the Java driver”. Build from source. Good luck. Be super conscious of package versions. More on that in a minute.

THE REAL WORLD ™ Start the Daemons. Coordinator. # hyperdex coordinator -f -l 127.0.0.1 -p 1982 Data Nodes. # hyperdex daemon -f --listen=127.0.0.1 --listen-port=2022 \

--coordinator=127.0.0.1 --coordinator-port=1982 \ --data=./data0/

# hyperdex daemon -f --listen=127.0.0.1 --listen-port=2032 \

--coordinator=127.0.0.1 --coordinator-port=1982 \ --data=./data1/

THE REAL WORLD ™ Client Demo The Python client is the HyperDex shell. Create Hyperspace. # python

THE REAL WORLD ™ Client Demo Create a client. Basic PUT/GET. Uses Key subspace. # python

THE REAL WORLD ™ Client Demo Search. Uses further subspaces.

THE REAL WORLD ™ Client Demo Updates and Range Query/Search.

THE REAL WORLD™ Performance

THE REAL WORLD ™ Bashing the Prophetess & the Giant. Performance Benchmarks use the YCSB against Cassandra and MongoDB. Dedicated cluster of 14 Nodes in the VICCI cloud. Take it with a grain of salt. I’m missing Riak.

THE REAL WORLD ™ Throughput.

THE REAL WORLD ™ Latency.

THE REAL WORLD ™ Scaling.

THE REAL WORLD™ Experiences & Findings

THE REAL WORLD ™ Experiences. Findings. Minor versions are incompatible. -  hyperdex-1.0.rc4 vs. hyperdex-1.0.rc5 -  import hyperclient vs. hyperdex.admin, hyperdex.client -  (hyperdisk) vs. leveldb vs. hyperleveldb -  There goes my dream of using the PHP driver on github. -  Migration? No idea. -  Compile? Use VM to go.

THE REAL WORLD ™ Experiences. Findings. It’s just a K/V store. -  No methods to do distributed computations. Python map/reduce is on the agenda.

No Dynamo Ring. But a chain to rule them all. -  Fault-tolerance with f dedicated nodes is fine, but what about multiple datacenters?

It’s a quite young project with few committers. Important internals change between minor versions. Not much sleep for them. How about your DevOps?

REMEMBER? *log: Storm-based Analytics RT

WHAT ABOUT? *log We chose Riak. -  Excellent Java driver. -  We don’t need transactions. -  During development, our schema will change often. -  Operational ease, easy to scale, excellent feedback. -  Map/reduce in Erlang and JS. Can use the result of a secondary index query. -  Solr Integration with Riak Search. Not at the moment, but we deal with content.

We like HyperDex. -  Really interesting concepts and advancements, but atm not the perfect fit. -  Implemented a storage backend abstraction layer. Easy to switch to HyperDex once its more mature.

Thank You.

Hyperdex - A closer look

Technology

Transcript of Hyperdex - A closer look