Developing a Real-time Engine with Akka, Cassandra, and Spray

42
Developing a Real-time Engine with Akka, Cassandra, and Spray Jacob Park

Transcript of Developing a Real-time Engine with Akka, Cassandra, and Spray

Page 1: Developing a Real-time Engine with Akka, Cassandra, and Spray

Developing a Real-time Engine with Akka, Cassandra, and SprayJacob Park

Page 2: Developing a Real-time Engine with Akka, Cassandra, and Spray

What is Paytm Labs and Paytm?• Paytm Labs is a data-driven lab focusing on tackling very difficult problems involving the topics of fraud, recommendations, ratings, and platforms for Paytm.• Paytm is the world's fastest growing mobile-first marketplace and payment ecosystem that serves over 100 million people who make over 1.5 million business transactions representing $1.7 billion of goods and services exchanged annually.

Page 3: Developing a Real-time Engine with Akka, Cassandra, and Spray

What is Akka?• Akka (http://akka.io/):• “Akka is a toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications on the JVM.”• Packages: “akka-actor”, “akka-remote”, “akka-cluster”, “akka-persistence”, “akka-http”, and “akka-stream”.

Page 4: Developing a Real-time Engine with Akka, Cassandra, and Spray

What is Cassandra?• Cassandra (http://cassandra.apache.org/):• “The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance.”

Page 5: Developing a Real-time Engine with Akka, Cassandra, and Spray

What is Spray?• Spray (http://spray.io/):• “Spray is an open-source toolkit for building REST/HTTP-based integration layers on top of Scala and Akka.”• Packages: “spray-caching”, “spray-can”, “spray-http”, “spray-httpx”, “spray-io”, “spray-json”, “spray-routing”, “spray-servlet”.

Page 6: Developing a Real-time Engine with Akka, Cassandra, and Spray

What is Maquette?• A real-time fraud rule-engine which enables synchronous calls for core operational platforms to evaluate fraud.• Its core technologies include Akka, Cassandra, and Spray.

Page 7: Developing a Real-time Engine with Akka, Cassandra, and Spray

Why Akka, Cassandra, and Spray?• Akka, Cassandra, and Spray are highly performant, developer-friendly, treat failures as a first-class concept, provide great support for clustering to ensure responsiveness, resiliency, and elasticity when creating Reactive Systems.

Page 8: Developing a Real-time Engine with Akka, Cassandra, and Spray

Maquette In a Nutshell

HTTP Environment Executor

Page 9: Developing a Real-time Engine with Akka, Cassandra, and Spray

Maquette Actor System

Page 10: Developing a Real-time Engine with Akka, Cassandra, and Spray

HTTP Layer• Utilize Spray-Can for a fast HTTP endpoint.• Utilize Jackson for JSON deserialization/serialization.• Utilize a separate dispatcher for the Bulkhead Pattern.• Expose a normalized yet flexible schema for integration.• Request Handling: Worst → Best• Cameo Pattern (Per-request Actor), • Ask Pattern (Future), • RequestHandlerPool (Akka Router Pool).

Page 11: Developing a Real-time Engine with Akka, Cassandra, and Spray

HTTP Layertrait FraudRoute extends BaseRoute with ActorLogging { this: Actor =>

import SprayJacksonSupportUtils._

override protected def receiveRequest( delegateActorRef: ActorRef, parentUriPath: Path ): Actor.Receive = { case incomingHttpRequest @ HttpRequest( HttpMethods.POST, requestUri, requestHeaders, requestEntity, requestProtocol

) if requestUri.path startsWith parentUriPath => val senderActorRef = sender()

unmarshalHttpEntityAndDelegateRequest( requestEntity, delegateActorRef, senderActorRef ) }}

Page 12: Developing a Real-time Engine with Akka, Cassandra, and Spray

Environment Layer• A tree of actors which are responsible for managing a cache or pool of Contexts and Dependencies required to evaluate incoming requests.• A Context is a Document Message which wraps configurations for evaluating requests.• A Dependency is a Document Message which wraps optimized queries to Cassandra.

Page 13: Developing a Real-time Engine with Akka, Cassandra, and Spray

Environment Layer• Map incoming requests to a Context by forking a template with .copy().• Forward the forked Context to Executor Layer in the same or different JVM with Akka Router.• Consider implementing a custom router to favour locality of execution on the same JVM until responsiveness requires distribution.

Page 14: Developing a Real-time Engine with Akka, Cassandra, and Spray

Environment Layer• Always pre-compute and pre-optimize the Environment Layer as a whole.• Allow the capability to remotely pre-compute and update Contexts.• Ensure Contexts and Dependencies are designed for optimization by allowing arithmetic reduction or sorts.

• Having a ProxyActor and StateActor for an EnvironmentActor is preferred to ensure caching of the whole environment to recover from failures fast.

Page 15: Developing a Real-time Engine with Akka, Cassandra, and Spray

Environment Layertype EnvironmentStateActorRefFactory = (EnvironmentProxyActorContext, EnvironmentProxyActorSelf) => ActorReftype EnvironmentActorRefFactory = (EnvironmentProxyActorContext, EnvironmentProxyActorSelf) => ActorRef

class EnvironmentProxyActor( environmentStateActorRefFactory: EnvironmentStateActorRefFactory, environmentActorRefFactory: EnvironmentActorRefFactory) extends Actor with ActorLogging {

val environmentStateActorRef = environmentStateActorRefFactory(context, self) val environmentActorRef = environmentActorRefFactory(context, self)

override def receive: Receive = receiveEnvironmentState orElse receiveFraudRequest orElse receiveEnvironmentLocalCommand orElse receiveEnvironmentRemoteCommand}

Page 16: Developing a Real-time Engine with Akka, Cassandra, and Spray

Environment Layerclass EnvironmentStateActor( environmentProxyActorRef: ActorRef, databaseInstance: Database) extends Actor with ActorLogging { import EnvironmentStateActor._ import EnvironmentStateFactory._ import EnvironmentStateLifecycleStrategy._ import EnvironmentStateRepository._

var environmentState: Option[EnvironmentState] = None

override def receive: Receive = receiveLocalCommand orElse receiveRemoteCommand

object EnvironmentStateLifecycleStrategy { ... }

object EnvironmentStateFactory { ... }

object EnvironmentStateRepository { ... }}

Page 17: Developing a Real-time Engine with Akka, Cassandra, and Spray

Environment Layerclass EnvironmentActor( environmentProxyActor: ActorRef, executorActorRef: ActorRef, bootActorRef: ActorRef) extends Actor with ActorLogging { import EnvironmentActor._ import EnvironmentLifecycleStrategy._

var environmentState: Option[EnvironmentState] = None

override def receive: Receive = receiveEnvironmentState orElse receiveFraudRequest

def forkedMaquetteContext(fraudRequest: FraudRequest): Option[MaquetteContext] = { val forkedMaquetteContextOption = for { actualEnvironmentState <- environmentState actualBaseMaquetteContext <- actualEnvironmentState.maquetteContextMap. get(fraudRequest.evaluationType) actualForkMaquetteContext = actualBaseMaquetteContext. copy(fraudRequest = fraudRequest) } yield actualForkMaquetteContext

forkedMaquetteContextOption }}

Page 18: Developing a Real-time Engine with Akka, Cassandra, and Spray

Executor Layer• A pipeline of actors responsible for scheduling execution of Tasks defined within a Context with the specified Dependencies, executing the Tasks, and coordinating the results of the Tasks to provide a response.• A Task is an optimized set of executable rules.

Page 19: Developing a Real-time Engine with Akka, Cassandra, and Spray

Executor Layer• Ideally, an Execution Layer should be stateless to allow easy recovery from failures.• Ideally, keep the Execution Layer available across the cluster.

Page 20: Developing a Real-time Engine with Akka, Cassandra, and Spray

Executor Layertype ExecutorRouterActorRefFactory = (ExecutorActorContext, ExecutorActorSelf) => ActorReftype ExecutorCoordinatorActorRefFactory = (ExecutorActorContext, ExecutorActorSender, ExecutorActorNext, MaquetteContext, Timeout) => ActorRef

class ExecutorActor( executorRouterActorRefFactory: ExecutorRouterActorRefFactory, executorCoordinatorActorRefFactory: ExecutorCoordinatorActorRefFactory, actionActorRef: ActorRef) extends Actor with ActorLogging { import ExecutorActor._ import ExecutorSchedulerStrategy._

val executorRouterActorRef: ActorRef = executorRouterActorRefFactory(context, self)

override def receive: Receive = receiveMaquetteContext orElse receiveMaquetteResult

object ExecutorSchedulerStrategy { def scheduleExecution(maquetteContext: MaquetteContext): Unit = { ... } }}

Page 21: Developing a Real-time Engine with Akka, Cassandra, and Spray

Executor Layer• Design a Task as a functional and monadic data structure.• Utilizing functional programming, the Task should isolate side effects from functions.• Utilizing Monads, the Task becomes easily optimizable with its properties for composition or reduction which allows high parallelization.

Page 22: Developing a Real-time Engine with Akka, Cassandra, and Spray

Executor Layercase class Query( selectComponent: Select, fromComponent: From, whereComponent: Where) { def + (that: Query): Query = { this.copy(selectComponent = Select(this.selectComponent.columnNames union that.selectComponent.columnNames) ) }

def - (that: Query): Query = { this.copy(selectComponent = Select(this.selectComponent.columnNames diff that.selectComponent.columnNames) ) }}

Note: An example of a Rule object is not shown as it is a trade secret.

Page 23: Developing a Real-time Engine with Akka, Cassandra, and Spray

Executor Layer• For a Task object, consider the use of an external DSL to interpret into executable and immutable graphs and even Java byte code.• Scala Parser Combinators: https://github.com/scala/scala-parser-combinators• Parboiled2: https://github.com/sirthias/parboiled2• ANTLR: http://www.antlr.org/

Page 24: Developing a Real-time Engine with Akka, Cassandra, and Spray

Executor Layerobject QueryParser extends JavaTokenParsers { def parseQuery(queryString: String): Try[Query] = { parseAll(queryStatement, queryString) ... }

object QueryGrammar { lazy val queryStatement: Parser[Query] = selectClause ~ fromClause ~ opt(whereClause) ~ ";" ^^ { case selectComponent ~ fromComponent ~ whereComponent ~ ";" => Query(selectComponent, fromComponent, whereComponent.getOrElse(Where.Empty)) } }

object SelectGrammar { ... } object FromGrammar { ... } object WhereGrammar { ... } object StaticClauseGrammar { ... } object DynamicClauseGrammar { ... } object InterpolationTypeGrammar { ... } object DataTypeGrammar { ... } object LexicalGrammar { ... }}

Note: An example of a Rule parser is not shown as it is a trade secret.

Page 25: Developing a Real-time Engine with Akka, Cassandra, and Spray

Abstracting Concurrency for High Parallelism Tasks• Scala Futures.• Scala Parallel Collections.• Akka Router Pool.• Akka Streams.

Page 26: Developing a Real-time Engine with Akka, Cassandra, and Spray

Scala Futures• “A Future is an object holding a value which may become available at some point.”

val f = for { a <- Future(10 / 2) b <- Future(a + 1) c <- Future(a - 1) if c > 3} yield b * c

f foreach println

Page 27: Developing a Real-time Engine with Akka, Cassandra, and Spray

Scala Futures• Advantages: Efficient, Highly Parallel, Simple Monadic Abstraction.• Disadvantages: Lacks Communication, Lacks Low-Level Concurrency Control, JVM Bound.• Note: Monadic Futures Enqueue All Operations to

ExecutionContext ⇒ Lack of Control over Context-Switching.

Page 28: Developing a Real-time Engine with Akka, Cassandra, and Spray

Scala Parallel Collections• Scala Parallel Collections is a package in the Scala standard library which allows collections to execute operations in parallel.

(0 until 100000).par .filter(x => x.toString == x.toString.reverse)

Page 29: Developing a Real-time Engine with Akka, Cassandra, and Spray

Scala Parallel Collections• Advantages: Very Efficient, Highly Parallel, Control of Parallelism Level.• Disadvantages: Lacks Communication, Non-parallelizable Operations (foldLeft() and aggregate()), Non-deterministic and Side Effects Issues for Degree of Abstraction, JVM-Bound.

Page 30: Developing a Real-time Engine with Akka, Cassandra, and Spray

Akka Router Pool• An Akka Router Pool maintains pool of child actors to forward messages.• If an Akka Router Pool is configured with an appropriate dispatcher, mailbox, supervisor, and routing logic, it allows a highly parallel yet elastic construct to execute tasks.

Page 31: Developing a Real-time Engine with Akka, Cassandra, and Spray

Akka Router Poolval routerSupervisionStrategy = OneForOneStrategy() { case _ => SupervisorStrategy.Restart}val routerPool = FromConfig. withSupervisorStrategy(routerSupervisionStrategy)val routerProps = routerPool.props( ExecutorWorkerActor.props(accessLayer). withDispatcher(DispatcherConfigPath))

context.actorOf( props = routerProps, name = RouterName)

Page 32: Developing a Real-time Engine with Akka, Cassandra, and Spray

Akka Router Pool• Advantages:• Work-Pull Pattern = Rate Limiting.• Bounded Mailbox = Backpressure.• SupervisionStrategy = Failure.• Scheduler = Timeout.• Router Resizer = Predictive Parallelism & Scaling.• Dispatcher Throughput = Predictive Context Switching.• Location Transparency = JVM Unbound.

Page 33: Developing a Real-time Engine with Akka, Cassandra, and Spray

Akka Router Pool• Disadvantages:• Complex optimizations or implementation required.• Actors with state potentially lead to issues regarding mutability and lack of idempotence.• Actors which require communication beyond parent-child trees lead to potentially complex graphs.

Page 34: Developing a Real-time Engine with Akka, Cassandra, and Spray

Akka Steams• “Akka Streams is an implementation of Reactive Streams, which is a standard for asynchronous stream processing with non-blocking backpressure.”

implicit val system = ActorSystem("reactive-tweets")implicit val materializer = ActorMaterializer() val authors: Source[Author, Unit] = tweets .filter(_.hashtags.contains(akka)) .map(_.author) authors.runWith(Sink.foreach(println))

Page 35: Developing a Real-time Engine with Akka, Cassandra, and Spray

Akka Steams• Advantages: Backpressure and Failure as First-class Concepts, Concurrency Control, Simple Monadic Abstraction, Graph API, Bi-directional Channels.• Disadvantages: Too New = Risk for Production.• Current: JVM Bounded; Potentially: Distributed Streaming.• Current: No Graph Optimization; Potentially: Macro-based Optimization.

Page 36: Developing a Real-time Engine with Akka, Cassandra, and Spray

Maquette Performance• With 10 Cassandra nodes, 4 Maquette nodes, and an HA Proxy as a staging environment, ~40 000 requests per second with a mean 10 millisecond response time with 50 rules.

Page 37: Developing a Real-time Engine with Akka, Cassandra, and Spray

Tips• Investigate Akka Streams for Akka HTTP.• Investigate CPU usage and memory consumption: YourKit or VisualVM and Eclipse MAT.• Utilize Kamon for real-time metrics to StatsD or a third-party service like Datadog.• If implementing a DSL or a complex actor-based graph, remember to utilize ScalaTest and Akka TestKit properly.• Utilize Gatling.io for load and scenario based testing.

Page 38: Developing a Real-time Engine with Akka, Cassandra, and Spray

Tips• We used Cassandra 2.1.6 as our main data store for Maquette. We experienced many pains with operating Cassandra.• Mastering Apache Cassandra (2nd Edition): http://www.amazon.com/Mastering-Apache-Cassandra-Second-Edition-ebook/dp/B00VAG2WZO

Page 39: Developing a Real-time Engine with Akka, Cassandra, and Spray

Tips• Investigate the Play Framework with Akka Cluster to create a web application for operations.• Commands to operate instances in the cluster.• Commands to configure instances in real-time.• GUI interface for data scientists and business analysts to easily define and configure rules.

Page 40: Developing a Real-time Engine with Akka, Cassandra, and Spray

Tips• Utilize Kafka to publish audits which can be utilized to monitor rules through an Logstash, Elasticsearch, and Kibana flow, and archived in a HDFS.• Consider Kafka to replay audits as requests to run real-time engine offline for tuning rules.

Page 41: Developing a Real-time Engine with Akka, Cassandra, and Spray

Resources• The Reactive Manifesto: • http://www.reactivemanifesto.org/

• Reactive Messaging Patterns with the Actor Model: • http://www.amazon.ca/Reactive-Messaging-Patterns-Actor-Model/dp/

0133846830• Learning Concurrent Programming in Scala:• http://www.amazon.com/Learning-Concurrent-Programming-Aleksan

dar-Prokopec/dp/1783281413• Akka Concurrency: • http://www.amazon.ca/Akka-Concurrency-Derek-Wyatt/dp/09815316

60

Page 42: Developing a Real-time Engine with Akka, Cassandra, and Spray

Thank you!Jacob ParkPhone Number [email protected]@gmail.com