Journey into Reactive Streams and Akka Streams

Post on 08-Aug-2015

133 views 0 download

Tags:

Transcript of Journey into Reactive Streams and Akka Streams

A journey into stream processing with

Reactive Streamsand

Akka Streams

Before we get started...

http://scalaupnorth.com/

Scala Up North, September 25 & 26

• Keynote from Bill Venners

• BoldRadius offering Scala training

http://boldradius.com

What to expect

• Core concepts

• What is a stream?

• Common use cases?

• The Reactive Streams specification

• A deep-dive into Akka Streams

• Code walkthrough and demo

• Q&A

Disclaimer• I am not a stream processing expert, but I am passionately

curious about an alternate approach to common problems

• This is a deep topic, the contents of this talk are a starting point for further exploration

• Feel free to jump in

Core ConceptsPart 1 of 5

What is a stream?• Flow of data

• Events, commands, machine data, etc

• Live or at rest

• Bounded or unbounded in size

• Similar to an array laid out in time instead of memory

Appeal of stream processing?• Scaling business logic

• Processing real-time data (fast data)

• Batch processing of large data sets (big data)

• Monitoring, analytics, complex event processing, etc

Scaling business logic• Streams can be useful for modelling and breaking apart

monolithic apps that primarily transform data

• Async stream processing steps can be scaled individually

Processing real-time data

• Ephemeral

• Unbounded in size

• Potential "flooding" downstream

You cannot step twice into the same stream. For as you are stepping in, other waters are ever flowing on to you. — Heraclitus

Push vs pull

Pull1. Consumer calls producer

2. Consumer blocks

3. Producer sends data when available

Works best when producer is faster than consumer

Push1. Producer sends data to consumer

Works best when producer is slower than the consumer

Backpressure

Backpressure?• We need a way to signal when a consumer is able to

process more data

• Propogate backpressure through the entire flow

• Without backpressure data keeps flowing at full speed

• Leads to OOM errors, crashes, etc

Consumer usually has some kind of buffer.

Fast producers can overwhelm the buffer of a slow consumer.

Option 1: Use bounded buffer and drop messages.

Option 2: Increase buffer size if memory available.

Option 3: Pull-based backpressure.

Reactive StreamsPart 2 of 5

Reactive StreamsReactive Streams is a specification and low-level API for library developers.

Compliant RS implementations include the following:

• RxJava (Netflix)

• Reactor (Pivotal)

• Vert.x (RedHat)

• Akka Streams and Slick (Typesafe)

Three main repositories• Reactive Streams for the JVM

• Reactive Streams for JavaScript

• Reactive Streams IO (for network protocols such as TCP, WebSockets and possibly HTTP/2)

• Early exploration kicked off by Netflix

• 2016 timeframe

Reactive Streams JVM API specOnly for library builders, not for direct usage.public interface Processor<T, R> extends Subscriber<T>, Publisher<R> {}

public interface Publisher<T> { public void subscribe(Subscriber<? super T> s);}

public interface Subscriber<T> { public void onSubscribe(Subscription s); public void onNext(T t); public void onError(Throwable t); public void onComplete();}

public interface Subscription { public void request(long n); public void cancel();}

Faster publisher responsibilities?• Not generate elements, if it is able to control their

production rate

• Try buffering the elements in a bounded manner until more demand is signalled

• Drop elements until more demand is signalled

• Tear down the stream if unable to apply any of the above strategies

Reactive StreamsVisit the Reactive Streams website for more information.

http://www.reactive-streams.org/

Details:• TCK (Technology Compatibility Kit)

• API (JVM, JavaScript)

• Specifications

• Early conversation on future spec for IO

Akka StreamsPart 3 of 5

Akka StreamsAkka Streams provides a way to express and run a chain of asynchronous processing steps acting on a sequence of elements.

• DSL for async/non-blocking stream processing

• With "free" backpressure

• Conforms to the Reactive Streams spec for compatibility

Basics• Source - A processing stage with exactly one output

• Sink - A processing stage with exactly one input

• Flow - A processing stage which has exactly one input and output

• RunnableFlow - A Flow that has both ends "attached" to a Source and Sink

API designGoals

• Supremely composable

• Exhaustive model, everything you need for stream processing including error handling

API designConsiderations

• Immutable, reuseable stream blueprints

• Explicit materialization step

• No magic at the expense of some extra code

Materialization• Separate the what from the how

• Declarative Source/Flow/Sink to create a blueprint

• FlowMaterializer turns blueprint into actors

• Involves an extra step, but no magic

Error handling• The element causing division by zero will be dropped

• Result will be a Future completed with Success(228)val decider: Supervision.Decider = exc => exc match { case _: ArithmeticException => Supervision.Resume case _ => Supervision.Stop}// ActorFlowMaterializer takes the list of transformations comprising a akka.stream.scaladsl.Flow // and materializes them in the form of org.reactivestreams.Processorimplicit val mat = ActorFlowMaterializer( ActorFlowMaterializerSettings(system).withSupervisionStrategy(decider))val source = Source(0 to 5).map(100 / _)val result = source.runWith(Sink.fold(0)(_ + _))

Dynamic push/pull backpressure• Fast consumer can issue more Request(n) even before more

data arrives

• Producer can accumulate demand

• Total demand of elements is safe to publish

• Consumer's buffer will never overflow

• Default is push-based until consumer cannot cope

Fan out• Broadcast[T] (1 input, n outputs)

• Signals each output given an input signal

• Balance[T] (1 input => n outputs)

• Signals one of its output ports given an input signal

• FlexiRoute[In] (1 input, n outputs)

• Write custom fan out elements using a simple DSL

Fan in• Merge[In] (n inputs , 1 output)

• Picks signals randomly from inputs

• Zip[A,B,Out] (2 inputs, 1 output)

• Zipping into an (A,B) tuple stream

• Concat[T] (2 inputs, 1 output)

• Concatenate streams (first, then second)

val g = FlowGraph.closed() { implicit builder: FlowGraph.Builder => import FlowGraph.Implicits._ val in = Source(1 to 10) val out = Sink.ignore

val bcast = builder.add(Broadcast[Int](2)) val merge = builder.add(Merge[Int](2))

val f1, f2, f3, f4 = Flow[Int].map(_ + 10)

in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out bcast ~> f4 ~> merge}

conflateabstract def conflate[S](seed: (T) ⇒ S, aggregate: (S, T) ⇒ S): Flow[S]

Allows a faster upstream to progress independently of a slower consumer by conflating elements into a summary until the consumer is ready to accept them.

groupedWithinabstract def groupedWithin(n: Int, d: FiniteDuration): Flow[Seq[T]]

Chunk up this stream into groups of elements received within a time window, or limited by the given number of elements, whatever happens first.

Simple streaming from/to Kafkaimplicit val actorSystem = ActorSystem("ReactiveKafka")implicit val materializer = ActorMaterializer()

val kafka = new ReactiveKafka(host = "localhost:9092", zooKeeperHost = "localhost:2181")val publisher = kafka.consume("lowercaseStrings", "groupName", new StringDecoder())val subscriber = kafka.publish("uppercaseStrings", "groupName", new StringEncoder())

// consume lowercase strings from kafka and publish them transformed to uppercaseSource(publisher).map(_.toUpperCase).to(Sink(subscriber)).run()

Akka Streams versus other streams

Part 4 of 5

Akka Streams• Distributed and fault-tolerant

• Sensitive to bidirectional pressure

• Easy to program complex processing flow graphs

Java Streams• Iterators with a weaker but more parallelism-friendly

interface

• Only high-level control (no next/hasNext)

• Transformation, not distribution

• Push or pull chosen statically

RxJava• Pure push model

• Extensive DSL for transformations

• Only allows blocking backpressure

• Unbounded buffering across async boundary

Code review and demoPart 5 of 5

Source code available at https://github.com/rocketpages

Thank you!