Spark Summit - Stratio Streaming

download Spark Summit - Stratio Streaming

of 31

  • date post

  • Category


  • view

  • download


Embed Size (px)


Stratio Streaming is the result of combining the power of Spark Streaming as a continuous computing framework and Siddhi CEP engine as complex event processing engine.

Transcript of Spark Summit - Stratio Streaming

  • 1. StratioistheonlyBig Data platformableto combine, in onequery, storeddata withstreamingdata in real-time (in lessthan30 seconds).Weare polyglotsas well: Weuse SparkovertwonoSQLdatabases, Cassandra& Mongo DB.

2. Spark Streaming provides a high-level abstraction called discretized stream or DStream, which represents a continuous stream of data, and in fact is represented as a sequence of RDDs, which is Sparks abstraction of an immutable, distributed dataset.Shark(SQL)SparkStreamingMllib(machine learning)GraphX(graph) 3. map(func), flatMap(func),filter(func), count() repartition(numPartitions) union(otherStream) reduce(func),countByValue(), reduceByKey(func,[numTasks]) join(otherStream,[numTasks]), cogroup(otherStream, [numTasks]) transform(func) updateStateByKey(func) window(windowLength,slideInterval) countByWindow(windowLength, slideInterval) reduceByWindow(func,windowLength, slideInterval) reduceByKeyAndWindow(func,windowLength, slideInterval,[numTasks]) countByValueAndWindow(windowLength, slideInterval,[numTasks]) print() foreachRDD(func) saveAsObjectFiles(prefix,[suffix]) saveAsTextFiles(prefix,[suffix]) saveAsHadoopFiles(prefix,[suffix]) 4. Complex event processing, or CEP, is event processing that combines data from multiple sources to infer events or patterns that suggest more complicated circumstancesCEP as a technique helps discover complex events by analyzing and correlating other events 5. A CEP engine should provide operators over streams, keeping in mind that events and streams in a CEP are first-class citizens. In CEP, we think in terms of event streams: event stream is a sequence of events that arrives over time.Users provide queriesto the CEP engine whose main mission is matching those queries against events coming through event streams.A CEP engine thus has notion of time and it allows working with temporal queries that reason in terms of temporal concepts, such as time windows or before and after event relationships among othersFilterJoinAggregation (Avg, Sum , Min, Max, Custom)Group byHavingConditions and Expressions (and, or, not, true/false, ==,!=, >=, >,