Camunda and Apache Cassandra

Stan Levine [email protected]

CONTEXTSPACE, CAMUNDA AND CASSANDRA

•  ContextSpace is a platform for executing secure digital customer

engagement on a very large scale.

•  Digital engagement requires consistency, a.k.a. Business Process

Management.

•  Digital engagement requires Big Identity, with many contextual

profiles per person and massive quantities of behavioral data

from customer interactions and mobile devices, hence Cassandra.

WHO IS CONTEXTSPACE

Founded July 2011

Offices:

•  Tel Aviv, Israel

•  Sydney, Australia

WHO IS CONTEXTSPACE DESIGNED FOR?

Very large communities of consumers, with many contextual relationships, typically found in: •  Health Care •  Telecom •  Smart Cities •  Digital Media •  Retail •  Energy •  Service Providers

CONTEXTSPACE ARCHITECTURE

TYPICAL CONTEXTSPACE USE OF CAMUNDA

ContextSpace employs Camunda as a toolset to allow its customers to develop consistent digital engagement business processes, with a common underlying API access to all services and data. We also create Camunda-based service processes to support highly specialized industrial patterns that require: •  Specific conditional processing •  Industry data formats (such as HL7 FHIR) •  Multiple service escalation paths •  Long running processes

SIMPLE HEALTHCARE PROCESS PATTERN: DIABETES MILLITUS TYPE 2 MONITORING

HIGH VELOCITY HEALTH OBSERVATION INGESTION

mHealth Camunda Processes

Cassandra Persistence Analytics

WHY DOES CONTEXTSPACE USE CASSANDRA?

•  Exceptional performance, linearly scaled •  No DBAs, no tuning, fault tolerant, lights out operation •  Inherently multi-data centre aware for active-active

distributed operations •  Simple - replaces more than a dozen heterogeneous

technologies that our solution would otherwise require All ContextSpace data services are based on Cassandra, enabling reliable operations with low operational costs.

OPEN SOURCE CAMUNDA ON CASSANDRA

•  Begun in Berlin as “hackathon session” in July 2015 between Camunda and ContextSpace

•  Amazing Camunda Team! •  Maintained by ContextSpace •  Purpose: extend scalability, availability and

distributed processing to Camunda operations •  Initial goal: support core BPM Engine functions,

including Job Executor •  Target production launch: Q4 2015

CASSANDRA STRENGTHS

•  Great for stable or immutable data •  Writes are faster than reads (both are very fast) •  Highly granular, tunable consistency •  Inherent data retention management •  Very reliable •  All nodes identical, no SPOF, no “fail-over” required •  No effective limit to quantity of managed data

CASSANDRA LIMITATIONS

•  No ad-hoc searches– need to model your queries, not your data

•  Deleting data can be problematic. Deleting entire rows works well, “deleting” columns creates “tombstones” that can decrease performance and compromise stability.

•  Rapidly changing columnar data is an anti-pattern •  No native join operations •  Rudimentary locking support •  Rudimentary native indexing support

CASSANDRA INDICES

•  “Out of the Box” secondary indices are limited to low-cardinality data. These are useless or even dangerous to use.

•  Most developers employ custom indexing schemes very successfully. For Camunda, a simple, unsorted reverse lookup table will meet most requirements.

•  Camunda job scheduler requires sorted indices •  Therefore, we have developed a Cassandra indexing

framework for Camunda that supports both sorted and unsorted indexing.

CASSANDRA LOCKING

•  Native Cassandra locking is rudimentary •  Atomic batches can be used for transactions (multiple

insert, update, delete in a single logical operation) •  However, if batch performs a lock, all statements can

only apply to one partition (row). •  With separate tables used for custom indices, scheduled

jobs, etc., we need to use multiple batches within one transaction, however, this breaks the atomicity.

LOCKING APPROACH AND LIMITATIONS

•  We lock an entire process, including executions, event subscriptions, and variables by placing all this data in one Cassandra partition (row).

•  Indices and jobs are still not updated atomically, leading to potential data integrity issues.

•  When locking the entire process, parallel (non-exclusive) tasks will execute sequentially due to optimistic locking. This is tolerable for many use cases, but undesirable for others (such as events that need to occur on time). For example, our urgent customer messaging events need to conform to service levels.

OVERCOMING LOCKING LIMITATIONS - ZOOKEEPER

•  Industry best practice for Cassandra locking is to use an external lock manager. ContextSpace uses Zookeeper, which shares Cassandra distributed and “lights-out” management strengths.

•  With Zookeeper, we can maintain any degree of locking granularity, which permits parallel execution to be performed.

•  By removing locking from atomic batches, we can maintain full atomicity for transactions.

•  We are currently implementing Zookeeper support and recommend this configuration for production applications.

JOB SCHEDULER (ASYNCH EXECUTION)

•  Support for the Job Scheduler has recently been added to this project.

•  Camunda job scheduling presents us with another challenge.

•  We only need to pick jobs that are due. This requires using global ordering of jobs by time.

•  This is essentially a queue.

JOB SCHEDULER (ASYNCH EXECUTION) •  Cassandra and queuing patterns don’t like each other. •  Cassandra can only maintain order within a single partition. •  Remember the “tombstones?” Queuing operations generate a

heavy columnar delete workload. This creates the “perfect storm” for Cassandra.

•  Accessing a common partition (which is always stored on a single Cassandra node), will also create a database hotspot.

JOB SCHEDULER (ASYNCH EXECUTION) •  In this situation, best practice is to partition scheduling data to

contain one time slice per partition (such as a day or an hour). •  This addresses the problem with excessive tombstones, but

does not fully alleviate the hotspot. However, given Cassandra performance, this will only become an issue for very large workloads.

•  For high workload deployments, we will implement support for an external queue manager. Kafka is a distributed queue manager that shares lights-out characteristics with Cassandra and Zookeeper and is already part of the ContextSpace architecture.

CASSANDRA PERSISTENCE FOR CAMUNDA

•  Is Cassandra a natural match for Camunda? In a word, not really….

•  Camunda creates and then deletes a lot. Cassandra generally hates deletions.

•  Cassandra locking is enforced at partition (row) level only. Using Zookeeper, we can update process, job execution and indices atomically.

•  Locking one process within a row precludes parallel executions. Again, Zookeeper locking addresses this.

•  Job Executor always searching for next jobs at current time, will consistently create hotspots in Cassandra. This is fine for most workloads, but for very high loads we will support Kafka.

UNDERSTANDING THE STACK OPTIONS •  CAMUNDA + CASSANDRA implements optimistic locking

and will not support parallel executions, but will be highly available across multiple distributed data centres.

•  CAMUNDA + CASSANDRA + ZOOKEEPER implements granular locking and can support the full capabilities of the CAMUNDA BPM engine.

•  CAMUNDA + CASSANDRA + ZOOKEEPER + KAFKA eliminates all anti-patterns and will provides support for virtually unlimited workloads in a distributed environment

SUMMARY •  Cassandra is not a “native fit” for Camunda •  Therefore, much work has been done to counter anti-

patterns •  Nevertheless, backing Camunda with Cassandra promises

incredible performance, scalability, availability and operational gains

•  ContextSpace focus is on completing operational support for the core BPM engine

•  Additional Camunda user application queries can then be incrementally supported as they are required.

THANK YOU AND BRIEF Q&A

Stan Levine [email protected]

Camunda and Apache Cassandra

Software

Transcript of Camunda and Apache Cassandra