Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

52

description

In this session, you’ll learn about how Apache Cassandra is used with Python in the NY Times ⨍aбrik messaging platform. Michael will start his talk off by diving into an overview of the NYT⨍aбrik global message bus platform and its “memory” features and then discuss their use of the open source Apache Cassandra Python driver by DataStax. Progressive benchmark to test features/performance will be presented: from naive and synchronous to asynchronous with multiple IO loops; these benchmarks tailored to usage at the NY Times. Code snippets, followed by beer, for those who survive. All code available on Github!

Transcript of Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

Page 1: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 2: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

Cassandra python driver Benchmarking concurrency for nyt aбrik⨍[email protected]

Page 3: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 4: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 5: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 6: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 7: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 8: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 9: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 10: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 11: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 12: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

A Global Mesh with a Memory

Message-based: WebSocket, AMQP, SockJS

If in doubt:• Resend• Reconnect• Reread

Idempotent:• Replicating• Racy• Resolving

Classes of service:• Gold: replicate/race• Silver: prioritize• Bronze: queueable

Millions of users

Page 13: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 14: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

Message: an event with data

CREATE TABLE source_data ( hash_key int, -- real ones are more complex message_id timeuuid, body blob, -- whatever metadata text, -- JSON PRIMARY KEY (hash_key, message_id));

Page 15: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 16: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 17: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 18: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

1-10kb

1-10kb

Ack

Ack

Push

Page 19: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

1kb

1kb

10-150kb

10-150kb

Pull

Synchronous:C* Thrift orCQL Native

Page 20: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

ConcurrentDegree = 3

(using theLibev eventLoop)

Asynchronous:CQL Native only

Page 21: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

More Concurrency

Can also try:• DC Aware• Token Aware• Subprocessing

Page 22: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

Build one

def build_message(self): message = { "message_id": str(uuid.uuid1()), "hash_key": randint(0, self._hash_key_range), # int(e ** 8) "app_id": self._app_id, "timestamp": datetime.utcnow().isoformat() + 'Z', "content_type": "application/binary", "body": os.urandom(randint(1, self._body_range)) # int(e ** 9) }

Page 23: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

Kick-off

def push_message(self): if self._submitted_count.next() < self._message_count: message = self.build_message() self.submit_query(message)

def push_initial_data(self): self._start_time = time()

try: with self._lock: for i in range( 0, min(CONCURRENCY, self._message_count) ): self.push_message()

Page 24: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

Put it in the pipeline

def submit_query(self, message): body = message.pop('body')

substitution_args = ( json.dumps(message, **JSON_DUMPS_ARGS), body, message['hash_key'], uuid.UUID(message['message_id']) )

future = self._cql_session.execute_async( self._query, substitution_args )

future.add_callback(self.push_or_finish) future.add_errback(self.note_error)

Page 25: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

Maintain concurrency or finish

def push_or_finish(self, _): try: if ( self._unfinished and self._confirmed_count.next() < self._message_count ): with self._lock: self.push_message() else: self.finish()

Page 26: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

1-10kb

1-10kb

Ack

Ack

Push

Page 27: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 28: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 29: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 30: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 31: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 32: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 33: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 34: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 35: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 36: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 37: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 38: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 39: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

Push some messages

usage: bm_push.py [-h] [-c [CQL_HOST [CQL_HOST ...]]] [-d LOCAL_DC] [--remote-dc-hosts REMOTE_DC_HOSTS] [-p PREFETCH_COUNT] [-w WORKER_COUNT] [-a] [-t] [-n {ONE, TWO, THREE, QUORUM, ALL, LOCAL_QUORUM, EACH_QUORUM, SERIAL, LOCAL_SERIAL, LOCAL_ONE}] [-r] [-j] [-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]

Push messages from a RabbitMQ queue into a Cassandra table.

Page 40: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

Push messages many times

usage: run_push.py [-h] [-c [CQL_HOST [CQL_HOST ...]]] [-i ITERATIONS] [-d LOCAL_DC] [-w [worker_count [worker_count ...]]] [-p [prefetch_count [prefetch_count ...]]] [-n [level [level ...]]] [-a] [-t] [-m MESSAGE_EXPONENT] [-b BODY_EXPONENT] [-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]

Run multiple test cases based upon the product of worker_counts,prefetch_counts, and consistency_levels. Each test case may be run with up to4 variations reflecting the use or not of the dc_aware and token_awarepolicies. The results are output to stdout as a JSON object.

Page 41: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 42: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 43: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

1kb

1kb

10-150kb

10-150kb

Pull

Page 44: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 45: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 46: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 47: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 48: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 49: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 50: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 51: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 52: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform