Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

73
Angelo Corsaro, PhD Chief Technology Officer [email protected] Reactive Data Centric Architectures with Vortex, Spark and ReactiveX

Transcript of Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Page 1: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Angelo  Corsaro,  PhD  Chief  Technology  Officer  

[email protected]

Reactive Data Centric Architectures with Vortex, Spark

and ReactiveX

Page 2: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Getting Reactive

Page 6: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Getting Data Centric

Page 11: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Reactive and Data Centric Architectures

Page 13: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Reactive Data-Centric Systems with Vortex

Page 14: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Step 1: Data Centricity in Vortex

Page 15: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Cop

yrig

ht P

rism

Tech

, 201

5

Vortex provides a Distributed Data Space abstraction where applications can autonomously and asynchronously read and write data enjoying spatial and temporal decoupling

Its built-in dynamic discovery isolates applications from network topology and connectivity details

Vortex’ Data Space is completely decentralised

High Level Abstraction

DDS Global Data Space

...

Data Writer

Data Writer

Data Writer

Data Reader

Data Reader

Data Reader

Data Reader

Data Writer

TopicAQoS

TopicBQoS

TopicCQoS

TopicDQoS

Page 17: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Cop

yrig

ht P

rism

Tech

, 201

5

A Topic defines a domain-wide information’s class

A Topic is defined by means of a (name, type, qos) tuple, where

• name: identifies the topic within the domain

• type: is the programming language type associated with the topic. Types are extensible and evolvable

• qos: is a collection of policies that express the non-functional properties of this topic, e.g. reliability, persistence, etc.

Topic

TopicTypeName

QoS

struct  TemperatureSensor  {        @key        long  sid;        float  temp;        float  hum;  }    

Page 18: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Cop

yrig

ht P

rism

Tech

, 201

5

As explained in the previous slide a topic defines a class/type of information

Topics can be defined as Singleton or can have multiple Instances

Topic Instances are identified by means of the topic key

A Topic Key is identified by a tuple of attributes -- like in databases

Remarks: - A Singleton topic has a single domain-wide instance - A “regular” Topic can have as many instances as the number of different key

values, e.g., if the key is an 8-bit character then the topic can have 256 different instances

Topic and Instances

Page 20: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Cop

yrig

ht P

rism

Tech

, 201

5

For data to flow from a DataWriter (DW) to one or many DataReader (DR) a few conditions have to apply:

The DR and DW domain participants have to be in the same domain

The partition expression of the DR’s Subscriber and the DW’s Publisher should match (in terms of regular expression match)

The QoS Policies offered by the DW should exceed or match those requested by the DR

Quality of ServiceDomain

Participant

DURABILITY

OWENERSHIP

DEADLINE

LATENCY BUDGET

LIVELINESS

RELIABILITY

DEST. ORDER

Publisher

DataWriter

PARTITION

DataReader

Subscriber

DomainParticipant

offered QoS

Topicwrites reads

Domain Idjoins joins

produces-in consumes-from

RxO QoS Policies

requested QoS

Page 21: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Step 2: Reactive Architectures with Vortex

Page 25: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Cop

yrig

ht P

rism

Tech

, 201

5

Reactive react to an environment which “cannot wait”, e.g. reacting late has harsh consequences

response time needs to be deterministic

distributed concurrent systems

data-centric

Reactive vs. Interactive Systems

The term reactive systems is increasingly used to denote interactive systems with stringent response time, performance and scalability constraints.

Interactive react to an environment which “doesn’t like to wait”, e.g. reacting late induces QoS degradation

response time needs to be acceptable

distributed concurrent systems

data-centric

Page 26: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Cop

yrig

ht P

rism

Tech

, 201

5

Concurrency Aside from the concurrency between the system and its environment, it is natural to decompose these systems as an ensemble of concurrent components that cooperate to achieve the intended behaviour

Strict Temporal Requirements Reactive Systems have strict requirements with respect to the rate at which the need to process input as well as the response time for their outputs

Determinism The output of these systems are generally determined by the input and the occurrence time, e.g. no scheduling effects on the temporal properties of the output

Reliability Reactive Systems are often life/mission critical, as such reliability is of utmost importance

Reactive Systems Key Traits

Page 27: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Architectural Styles for Reactive Systems

Page 31: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Stream Processing with Vortex

Page 35: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Vortex Architectural Benefits

Page 44: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Vortex and ReactiveX

Page 46: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Cop

yrig

ht P

rism

Tech

, 201

5

The Inversion of Control induced by callback-based code in known to make applications brittle and harder to understand

As an mental exercise, think about the call-back style code you’d have to write to in Java for drawing when the mouse is being dragged. Although the logic is simple, it is far cry from having a “declarative style”

The problem of callback management, infamously known as Callback Hell, has become even more important due to the surge of Reactive Systems…

Can we escape the Callback Hell?

The Callback Hell

Page 47: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Cop

yrig

ht P

rism

Tech

, 201

5

Reactive Programming is a paradigm based on a relaxed form of Synchronous Dataflow Programming

The Reactive Programming is built around the notion of continuous time-varying values and propagation of change

Reactive Programming facilitates the declarative development of non-blocking event driven applications — in essence you express what should be done and the runtime decides when to do it

Reactive Programming was popularised in the context of Functional Programming Languages by the seminal done by Elliot and Hudak in 1997 as part of Fran — a framework for composing richly interactive multimedia animations

Reactive Programming

Page 50: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Cop

yrig

ht P

rism

Tech

, 201

5

ReactiveX Observables allow the composition of flows and sequences of asynchronous data

You can think of Observables as some kinds of asynchronous collections that push data to you

In essence an observable can be created from any synchronous or asynchronous stream of data.

- you can create an observables that represents the data coming into a Vortex Data Reader, the events from a Button, or even time

- you can also create an observable that represent an actual container

ReactiveX Observables

Page 53: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Cop

yrig

ht P

rism

Tech

, 201

5

Beside from the standard ReactiveX/RxScala Observable, each framework that wants to integrate with ReactiveX/RxScala has to provide its own factory methods to create observables

For Vortex you have to use the DddsObservable defined as follows:

Vortex Observables

object  DdsObservable  {  

   def  fromDataReaderData[T](dr:  DataReader[T]):  Observable[T]  

   def  fromDataReaderEvents[T](dr:  DataReader[T]):  Observable[ReaderEvent[T]]  

   //  more  methods  which  we’ll  ignore  for  the  time  being  

}

Page 54: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Coding Lab

Page 60: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Cop

yrig

ht P

rism

Tech

, 201

5

Midpoint Triangle        val  circles  =  VortexObservable.fromDataReaderData  {              DataReader[ShapeType](Topic[ShapeType](circle))          }          val  squares  =  VortexObservable.fromDataReaderData  {              DataReader[ShapeType](Topic[ShapeType](square))          }          val  ttopic  =  Topic[ShapeType](triangle)          val  tdw  =  DataWriter[ShapeType](ttopic)  

       //  Compute  the  average  between  circle  and  square  of  matching  color  with  flatMap          val  triangles  =  circles.flatMap  {              c  =>  squares.dropWhile(_.color  !=  c.color).take(1).map  {                  s  =>  new  ShapeType(s.color,  (s.x  +  c.x)/2,  (s.y  +  c.y)/2,  (s.shapesize  +  c.shapesize)/4)              }          }  

       triangles.subscribe(tdw.write(_))  

Page 61: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Cop

yrig

ht P

rism

Tech

, 201

5

Using for-comprehension            //  Compute  the  average  between  circle  and  square  of  matching  colour  with  flatMap          val  triangles  =  circles.flatMap  {              c  =>  squares.dropWhile(_.color  !=  c.color).take(1).map  {                  s  =>  new  ShapeType(s.color,  (s.x  +  c.x)/2,  (s.y  +  c.y)/2,  (s.shapesize  +  c.shapesize)/4)              }          }  

           //  Compute  the  average  between  circle  and  square  of  matching  colour  with  for  comprehension          val  triangles  =  for  {                c  <-­‐  circles;                s  <-­‐  squares.dropWhile(_.color  !=  c.color).take(1)          }  yield  new  ShapeType(s.color,  (s.x  +  c.x)/2,  (s.y  +  c.y)/2,  (s.shapesize  +  c.shapesize)/4)  

Page 64: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Big Data with Vortex and Spark

Page 71: Building Reactive Data-centric Applications with Vortex, Apache Spark and ReactiveX

Live Spark Lab