Data in Motion: Streaming Static Data Efficiently
-
Upload
martin-zapletal -
Category
Software
-
view
2.425 -
download
4
Transcript of Data in Motion: Streaming Static Data Efficiently
![Page 1: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/1.jpg)
MANCHESTER LONDON NEW YORK
![Page 2: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/2.jpg)
Martin Zapletal @zapletal_martin#ScalaDays
Data in Motion: Streaming Static Data Efficientlyin Akka Persistence (and elsewhere)
@cakesolutions
![Page 3: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/3.jpg)
Databases
![Page 4: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/4.jpg)
Batch processing
![Page 5: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/5.jpg)
Data at scale
● Reactive● Real time, asynchronous and message driven● Elastic and scalable● Resilient and fault tolerant
![Page 6: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/6.jpg)
Streams
![Page 7: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/7.jpg)
Streaming static data
● Turning database into a stream
![Page 8: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/8.jpg)
Pulling data from source
0 0
5 5
10 10
![Page 9: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/9.jpg)
0 0
0 0
5 5
10 10
![Page 10: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/10.jpg)
5 5
0 0
5 5
10 100 0
![Page 11: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/11.jpg)
10 10
0
5 5
10 105 5 0 0
0
![Page 12: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/12.jpg)
10 10
0 0
5 5
10 10
5 5 0 01 1
Inserts
![Page 13: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/13.jpg)
10 10
0 0
5 55
10 105 5 0 0
Updates
![Page 14: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/14.jpg)
Pushing data from source● Change log, change data capture
0 0
5 5
10 10
![Page 15: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/15.jpg)
0 0
5 5
10 10
1 1
![Page 16: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/16.jpg)
110 0
5 5
10 10
1 1
![Page 17: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/17.jpg)
Infinite streams of finite data source● Consistent snapshot and change log
0 05 510 10
0 0
5 510 10
1 10 0
5 510 10
1 1
![Page 18: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/18.jpg)
0
1
2
3
4
0
5
10
1
5
Inserted value 0
Inserted value 5
Inserted value 10
Inserted value 1
Inserted value 55
Log data structure
![Page 19: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/19.jpg)
Pulling data from a log
10 10 5 5 0 0
0 0
105 5
10
![Page 20: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/20.jpg)
10 10 5 5 0 0
0 0
1015 15
5 510
![Page 21: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/21.jpg)
0 0
15 15
5 5 15 15 10 10 5 5 0 010 10
![Page 22: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/22.jpg)
persistence_id1, event 2
persistence_id1, event 3
persistence_id1, event 4
persistence_id1, event 1
235
Akka Persistence
1 4
![Page 23: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/23.jpg)
Akka Persistence Query● eventsByPersistenceId, allPersistenceIds, eventsByTag
1 4 235
persistence_id1, event 2
persistence_id1, event 3
persistence_id1, event 4
persistence_id1, event 1
![Page 24: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/24.jpg)
Persistence_ id partition_nr
0 00 1
event 1
event 100 event 101 event 102
event 0 event 2
1 0 event 0 event 1 event 2
Akka Persistence Query Cassandra● Purely pull● Event (log) data
![Page 25: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/25.jpg)
Actor publisherprivate[query] abstract class QueryActorPublisher[MessageType, State: ClassTag](refreshInterval: Option[FiniteDuration]) extends ActorPublisher[MessageType] {
protected def initialState: Future[State] protected def initialQuery(initialState: State): Future[Action] protected def requestNext(state: State, resultSet: ResultSet): Future[Action] protected def requestNextFinished(state: State, resultSet: ResultSet): Future[Action] protected def updateState(state: State, row: Row): (Option[MessageType], State) protected def completionCondition(state: State): Boolean
private[this] def nextBehavior(...): Receive = { if (shouldFetchMore(...)) { listenableFutureToFuture(resultSet.fetchMoreResults()).map(FetchedResultSet).pipeTo(self) awaiting(resultSet, state, finished) } else if (shouldIdle(...)) { idle(resultSet, state, finished) } else if (shouldComplete(...)) { onCompleteThenStop() Actor.emptyBehavior } else if (shouldRequestMore(...)) { if (finished) requestNextFinished(state, resultSet).pipeTo(self) else requestNext(state, resultSet).pipeTo(self) awaiting(resultSet, state, finished) } else { idle(resultSet, state, finished) } }}
}
![Page 26: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/26.jpg)
private[query] abstract class QueryActorPublisher[MessageType, State: ClassTag](refreshInterval: Option[FiniteDuration]) extends ActorPublisher[MessageType] {
protected def initialState: Future[State] protected def initialQuery(initialState: State): Future[Action] protected def requestNext(state: State, resultSet: ResultSet): Future[Action] protected def requestNextFinished(state: State, resultSet: ResultSet): Future[Action] protected def updateState(state: State, row: Row): (Option[MessageType], State) protected def completionCondition(state: State): Boolean
private[this] def nextBehavior(...): Receive = { if (shouldFetchMore(...)) { listenableFutureToFuture(resultSet.fetchMoreResults()).map(FetchedResultSet).pipeTo(self) awaiting(resultSet, state, finished) } else if (shouldIdle(...)) { idle(resultSet, state, finished) } else if (shouldComplete(...)) { onCompleteThenStop() Actor.emptyBehavior } else if (shouldRequestMore(...)) { if (finished) requestNextFinished(state, resultSet).pipeTo(self) else requestNext(state, resultSet).pipeTo(self) awaiting(resultSet, state, finished) } else { idle(resultSet, state, finished) } }}
}
![Page 27: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/27.jpg)
initialQuery
Cancel
initialFinished
shouldFetchMore
shouldIdle
shouldTerminate
shouldRequestMore
SubscriptionTimeout
Cancel
SubscriptionTimeout
initialNewResultSet
request newResultSet
fetchedResultSet
finished
Cancel
SubscriptionTimeout
requestcontinue
Red transitionsdeliver buffer and update internal state (progress)
Blue transitions asynchronous database query
![Page 28: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/28.jpg)
SELECT * FROM ${tableName} WHERE persistence_id = ? AND partition_nr = ? AND sequence_nr >= ? AND sequence_nr <= ?
0 0
0 1
event 1
event 100 event 101 event 102
event 0 event 2
Events by persistence id
![Page 29: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/29.jpg)
0 0
0 1
event 1
event 100 event 101 event 102
event 2event 0
![Page 30: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/30.jpg)
0 0
0 1
event 1
event 100 event 101 event 102
event 2event 0
![Page 31: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/31.jpg)
0 0
0 1
event 1
event 100 event 101 event 102
event 2event 0
![Page 32: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/32.jpg)
0 0
0 1
event 1
event 100 event 101 event 102
event 2event 0
![Page 33: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/33.jpg)
0 0
0 1
event 1
event 100 event 101 event 102
event 2event 0
![Page 34: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/34.jpg)
0 0
0 1
event 0 event 1
event 100 event 101 event 102
event 2
![Page 35: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/35.jpg)
private[query] class EventsByPersistenceIdPublisher(...) extends QueryActorPublisher[PersistentRepr, EventsByPersistenceIdState](...) { override protected def initialState: Future[EventsByPersistenceIdState] = { ... EventsByPersistenceIdState(initialFromSequenceNr, 0, currentPnr) }
override protected def updateState( state: EventsByPersistenceIdState, Row: Row): (Option[PersistentRepr], EventsByPersistenceIdState) = { val event = extractEvent(row) val partitionNr = row.getLong("partition_nr") + 1
(Some(event), EventsByPersistenceIdState(event.sequenceNr + 1, state.count + 1, partitionNr)) }}
![Page 36: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/36.jpg)
private[query] class EventsByPersistenceIdPublisher(...) extends QueryActorPublisher[PersistentRepr, EventsByPersistenceIdState](...) { override protected def initialState: Future[EventsByPersistenceIdState] = { ... EventsByPersistenceIdState(initialFromSequenceNr, 0, currentPnr) }
override protected def updateState( state: EventsByPersistenceIdState, Row: Row): (Option[PersistentRepr], EventsByPersistenceIdState) = { val event = extractEvent(row) val partitionNr = row.getLong("partition_nr") + 1
(Some(event), EventsByPersistenceIdState(event.sequenceNr + 1, state.count + 1, partitionNr)) }}
![Page 37: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/37.jpg)
0 0
0 1
event 1
event 100 event 101 event 102
event 0 event 2
1 0 event 0 event 1 event 2
All persistence idsSELECT DISTINCT persistence_id, partition_nr FROM $tableName
![Page 38: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/38.jpg)
0 0
0 1
event 1
event 100 event 101 event 102
event 0 event 2
1 0 event 0 event 1 event 2
![Page 39: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/39.jpg)
0 0
0 1
event 1
event 100 event 101 event 102
event 0 event 2
1 0 event 0 event 1 event 2
![Page 40: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/40.jpg)
0
0
0
1
event 1
event 100 event 101 event 102
event 0 event 2
1 0 event 0 event 1 event 2
![Page 41: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/41.jpg)
private[query] class AllPersistenceIdsPublisher(...) extends QueryActorPublisher[String, AllPersistenceIdsState](...) {
override protected def initialState: Future[AllPersistenceIdsState] = Future.successful(AllPersistenceIdsState(Set.empty))
override protected def updateState( state: AllPersistenceIdsState, row: Row): (Option[String], AllPersistenceIdsState) = {
val event = row.getString("persistence_id")
if (state.knownPersistenceIds.contains(event)) { (None, state) } else { (Some(event), state.copy(knownPersistenceIds = state.knownPersistenceIds + event)) } }}
![Page 42: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/42.jpg)
private[query] class AllPersistenceIdsPublisher(...) extends QueryActorPublisher[String, AllPersistenceIdsState](...) {
override protected def initialState: Future[AllPersistenceIdsState] = Future.successful(AllPersistenceIdsState(Set.empty))
override protected def updateState( state: AllPersistenceIdsState, row: Row): (Option[String], AllPersistenceIdsState) = {
val event = row.getString("persistence_id")
if (state.knownPersistenceIds.contains(event)) { (None, state) } else { (Some(event), state.copy(knownPersistenceIds = state.knownPersistenceIds + event)) } }}
![Page 43: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/43.jpg)
Events by tag
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0 event 2,tag 1
1 0 event 0 event 1 event 2,tag 1
![Page 44: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/44.jpg)
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 2,tag 1
1 0 event 0 event 1
event 0
event 2,tag 1
![Page 45: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/45.jpg)
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0 event 2,tag 1
1 0 event 1event 0 event 2,tag 1
![Page 46: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/46.jpg)
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0 event 2,tag 1
1 0 event 0 event 1 event 2,tag 1
![Page 47: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/47.jpg)
event 0
event 0
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 2,tag 1
1 0 event 1 event 2,tag 1
![Page 48: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/48.jpg)
event 0
event 0 event 1
0 0
0 1event 100,tag 1
event 101 event 102
event 2,tag 1
1 0event 2,tag 1
event 1,tag 1
![Page 49: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/49.jpg)
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 2,tag 1
1 0event 2,tag 1
event 0
event 0 event 1
event 1,tag 1
![Page 50: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/50.jpg)
event 1,tag 1
event 2,tag 1
event 0
event 0 event 1
event 1,tag 10 0
0 1event 100,tag 1
event 101 event 102
1 0event 2,tag 1
![Page 51: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/51.jpg)
event 2,tag 1
event 0
event 0 event 1
0 0
0 1event 100,tag 1
event 101 event 102
1 0
event 2,tag 1
event 1,tag 1
![Page 52: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/52.jpg)
0 0
0 1
1 0event 2,tag 1
event 0
event 0 event 1
event 100,tag 1
event 101 event 102
event 2,tag 1
event 1,tag 1
![Page 53: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/53.jpg)
Events by tag
Id 0, event 1
Id 1,event 2
Id 0, event 100
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0
1 0 event 0 event 1 event 2,tag 1
Id 0, event 2
tag 1 1/1/2016
tag 1 1/2/2016
event 2,tag 1
SELECT * FROM $eventsByTagViewName$tagId WHERE tag$tagId = ? AND timebucket = ? AND timestamp > ? AND timestamp <= ? ORDER BY timestamp ASC LIMIT ?
![Page 54: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/54.jpg)
Id 1,event 2
Id 0, event 100
Id 0, event 1
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0
Id 0, event 2
1 0 event 0 event 1 event 2,tag 1
tag 1 1/1/2016
tag 1 1/2/2016
event 2,tag 1
![Page 55: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/55.jpg)
Id 1,event 2
Id 0, event 100
Id 0, event 1
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0
Id 0, event 2
1 0 event 0 event 1 event 2,tag 1
tag 1 1/1/2016
tag 1 1/2/2016
event 2,tag 1
![Page 56: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/56.jpg)
Id 0, event 100
Id 1,event 2
Id 0, event 1
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0
Id 0, event 2
1 0 event 0 event 1 event 2,tag 1
tag 1 1/1/2016
tag 1 1/2/2016
event 2,tag 1
![Page 57: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/57.jpg)
Id 0, event 100
Id 1,event 2
Id 0, event 1
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0
1 0 event 0 event 1 event 2,tag 1
tag 1 1/1/2016
tag 1 1/2/2016
event 2,tag 1
Id 0, event 2
![Page 58: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/58.jpg)
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0 event 2,tag 1
1 0 event 0 event 1 event 2,tag 1
tag 1 1/1/2016
tag 1 1/2/2016
![Page 59: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/59.jpg)
tag 1 1/1/2016
tag 1 1/2/2016
Id 0, event 1
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0
1 0 event 0 event 1 event 2,tag 1
persistence_id
seq
0 11 . . .
event 2,tag 1
![Page 60: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/60.jpg)
Id 0, event 100
Id 0, event 1
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0
1 0 event 0 event 1 event 2,tag 1
persistence_id
seq
0 ?1 . . .
event 2,tag 1
tag 1 1/1/2016
tag 1 1/2/2016
![Page 61: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/61.jpg)
Id 0, event 100
Id 0, event 2
Id 0, event 1
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0
1 0 event 0 event 1 event 2,tag 1
persistence_id
seq
0 ?1
event 2,tag 1
tag 1 1/1/2016
tag 1 1/2/2016
. . .
![Page 62: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/62.jpg)
seqNumbers match { case None => replyTo ! UUIDPersistentRepr(offs, toPersistentRepr(row, pid, seqNr)) loop(n - 1)
case Some(s) => s.isNext(pid, seqNr) match { case SequenceNumbers.Yes | SequenceNumbers.PossiblyFirst => seqNumbers = Some(s.updated(pid, seqNr)) replyTo ! UUIDPersistentRepr(offs, toPersistentRepr(row, pid, seqNr)) loop(n - 1)
case SequenceNumbers.After => replyTo ! ReplayAborted(seqNumbers, pid, s.get(pid) + 1, seqNr) // end loop
case SequenceNumbers.Before => // duplicate, discard if (!backtracking) log.debug(s"Discarding duplicate. Got sequence number [$seqNr] for [$pid], " + s"but current sequence number is [${s.get(pid)}]") loop(n - 1) }}
![Page 63: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/63.jpg)
seqNumbers match { case None => replyTo ! UUIDPersistentRepr(offs, toPersistentRepr(row, pid, seqNr)) loop(n - 1)
case Some(s) => s.isNext(pid, seqNr) match { case SequenceNumbers.Yes | SequenceNumbers.PossiblyFirst => seqNumbers = Some(s.updated(pid, seqNr)) replyTo ! UUIDPersistentRepr(offs, toPersistentRepr(row, pid, seqNr)) loop(n - 1)
case SequenceNumbers.After => replyTo ! ReplayAborted(seqNumbers, pid, s.get(pid) + 1, seqNr) // end loop
case SequenceNumbers.Before => // duplicate, discard if (!backtracking) log.debug(s"Discarding duplicate. Got sequence number [$seqNr] for [$pid], " + s"but current sequence number is [${s.get(pid)}]") loop(n - 1) }}
![Page 64: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/64.jpg)
def replay(): Unit = { val backtracking = isBacktracking val limit = if (backtracking) maxBufferSize else maxBufferSize - buf.size val toOffs = if (backtracking && abortDeadline.isEmpty) highestOffset else UUIDs.endOf(System.currentTimeMillis() - eventualConsistencyDelayMillis) context.actorOf(EventsByTagFetcher.props(tag, currTimeBucket, currOffset, toOffs, limit, backtracking, self, session, preparedSelect, seqNumbers, settings)) context.become(replaying(limit))}
def replaying(limit: Int): Receive = { case env @ UUIDPersistentRepr(offs, _) => // Deliver buffer case ReplayDone(count, seqN, highest) => // Request more case ReplayAborted(seqN, pid, expectedSeqNr, gotSeqNr) => // Causality violation, wait and retry. Only applicable if all events for persistence_id are tagged case ReplayFailed(cause) => // Failure case _: Request => // Deliver buffer case Continue => // Do nothing case Cancel => // Stop}
![Page 65: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/65.jpg)
def replay(): Unit = { val backtracking = isBacktracking val limit = if (backtracking) maxBufferSize else maxBufferSize - buf.size val toOffs = if (backtracking && abortDeadline.isEmpty) highestOffset else UUIDs.endOf(System.currentTimeMillis() - eventualConsistencyDelayMillis) context.actorOf(EventsByTagFetcher.props(tag, currTimeBucket, currOffset, toOffs, limit, backtracking, self, session, preparedSelect, seqNumbers, settings)) context.become(replaying(limit))}
def replaying(limit: Int): Receive = { case env @ UUIDPersistentRepr(offs, _) => // Deliver buffer case ReplayDone(count, seqN, highest) => // Request more case ReplayAborted(seqN, pid, expectedSeqNr, gotSeqNr) => // Causality violation, wait and retry. Only applicable if all events for persistence_id are tagged case ReplayFailed(cause) => // Failure case _: Request => // Deliver buffer case Continue => // Do nothing case Cancel => // Stop}
![Page 66: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/66.jpg)
Akka Persistence Cassandra Replaydef asyncReplayMessages(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) (replayCallback: (PersistentRepr) => Unit): Future[Unit] = Future { new MessageIterator(persistenceId, fromSequenceNr, toSequenceNr, max).foreach(msg => { replayCallback(msg) }) }
class MessageIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) extends Iterator[PersistentRepr] { private val initialFromSequenceNr = math.max(highestDeletedSequenceNumber(persistenceId) + 1, fromSequenceNr) private val iter = new RowIterator(persistenceId, initialFromSequenceNr, toSequenceNr) private var mcnt = 0L private var c: PersistentRepr = null private var n: PersistentRepr = PersistentRepr(Undefined) fetch() def hasNext: Boolean = ... def next(): PersistentRepr = … ...}
![Page 67: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/67.jpg)
Akka Persistence Cassandra Replaydef asyncReplayMessages(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) (replayCallback: (PersistentRepr) => Unit): Future[Unit] = Future { new MessageIterator(persistenceId, fromSequenceNr, toSequenceNr, max).foreach(msg => { replayCallback(msg) }) }
class MessageIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) extends Iterator[PersistentRepr] { private val initialFromSequenceNr = math.max(highestDeletedSequenceNumber(persistenceId) + 1, fromSequenceNr) private val iter = new RowIterator(persistenceId, initialFromSequenceNr, toSequenceNr) private var mcnt = 0L private var c: PersistentRepr = null private var n: PersistentRepr = PersistentRepr(Undefined) fetch() def hasNext: Boolean = ... def next(): PersistentRepr = … ...}
![Page 68: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/68.jpg)
Akka Persistence Cassandra Replaydef asyncReplayMessages(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) (replayCallback: (PersistentRepr) => Unit): Future[Unit] = Future { new MessageIterator(persistenceId, fromSequenceNr, toSequenceNr, max).foreach(msg => { replayCallback(msg) }) }
class MessageIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) extends Iterator[PersistentRepr] { private val initialFromSequenceNr = math.max(highestDeletedSequenceNumber(persistenceId) + 1, fromSequenceNr) private val iter = new RowIterator(persistenceId, initialFromSequenceNr, toSequenceNr) private var mcnt = 0L private var c: PersistentRepr = null private var n: PersistentRepr = PersistentRepr(Undefined) fetch() def hasNext: Boolean = ... def next(): PersistentRepr = … ...}
![Page 69: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/69.jpg)
class RowIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long) extends Iterator[Row] { var currentPnr = partitionNr(fromSequenceNr) var currentSnr = fromSequenceNr var fromSnr = fromSequenceNr var toSnr = toSequenceNr var iter = newIter()
def newIter() = session.execute(preparedSelectMessages.bind(persistenceId, currentPnr, fromSnr, toSnr)).iterator
final def hasNext: Boolean = { if (iter.hasNext) true else if (!inUse) false } else { currentPnr += 1 fromSnr = currentSnr iter = newIter() hasNext } }
def next(): Row = { val row = iter.next() currentSnr = row.getLong("sequence_nr") row }}
![Page 70: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/70.jpg)
class RowIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long) extends Iterator[Row] { var currentPnr = partitionNr(fromSequenceNr) var currentSnr = fromSequenceNr var fromSnr = fromSequenceNr var toSnr = toSequenceNr var iter = newIter()
def newIter() = session.execute(preparedSelectMessages.bind(persistenceId, currentPnr, fromSnr, toSnr)).iterator
final def hasNext: Boolean = { if (iter.hasNext) true else if (!inUse) false } else { currentPnr += 1 fromSnr = currentSnr iter = newIter() hasNext } }
def next(): Row = { val row = iter.next() currentSnr = row.getLong("sequence_nr") row }}
![Page 71: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/71.jpg)
class RowIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long) extends Iterator[Row] { var currentPnr = partitionNr(fromSequenceNr) var currentSnr = fromSequenceNr var fromSnr = fromSequenceNr var toSnr = toSequenceNr var iter = newIter()
def newIter() = session.execute(preparedSelectMessages.bind(persistenceId, currentPnr, fromSnr, toSnr)).iterator
final def hasNext: Boolean = { if (iter.hasNext) true else if (!inUse) false } else { currentPnr += 1 fromSnr = currentSnr iter = newIter() hasNext } }
def next(): Row = { val row = iter.next() currentSnr = row.getLong("sequence_nr") row }}
![Page 72: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/72.jpg)
Non blocking asynchronous replayprivate[this] val queries: CassandraReadJournal = new CassandraReadJournal( extendedActorSystem, context.system.settings.config.getConfig("cassandra-query-journal"))
override def asyncReplayMessages( persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long)(replayCallback: (PersistentRepr) => Unit): Future[Unit] = queries .eventsByPersistenceId( persistenceId, fromSequenceNr, toSequenceNr, max, replayMaxResultSize, None, "asyncReplayMessages") .runForeach(replayCallback) .map(_ => ())
![Page 73: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/73.jpg)
private[this] val queries: CassandraReadJournal = new CassandraReadJournal( extendedActorSystem, context.system.settings.config.getConfig("cassandra-query-journal"))
override def asyncReplayMessages( persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long)(replayCallback: (PersistentRepr) => Unit): Future[Unit] = queries .eventsByPersistenceId( persistenceId, fromSequenceNr, toSequenceNr, max, replayMaxResultSize, None, "asyncReplayMessages") .runForeach(replayCallback) .map(_ => ())
![Page 74: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/74.jpg)
Benchmarks
500010 00015 00020 00025 00030 00035 00040 000
500010 00015 00020 00025 00030 00035 00040 000
0 0
10 00020 00030 00040 000
0
50 000
Time
(s)
Time
(s)
Time
(s)
Actors
Threads, Actors
Threads 20 40 60 80 100 120 1405000 10000 15000 20000 25000 30000
10 20 30 40 50 60 70
45 00050 000
blockingasynchronous
REPLAY STRONG SCALING
WEAK SCALING
![Page 75: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/75.jpg)
node_id
Alternative architecture
0
1
persistence_id 0, event 0
persistence_id 0, event 1
persistence_id 1, event 0
persistence_id 0, event 2
persistence_id 2, event 0
persistence_id 0, event 3
![Page 76: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/76.jpg)
persistence_id 0, event 0
persistence_id 0, event 1
persistence_id 1, event 0
persistence_id 2, event 0
persistence_id 0, event 2
persistence_id 0, event 3
![Page 77: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/77.jpg)
tag 1 0
allIds
Id 0, event 1
Id 2, event 1
0 1
0 0 event 1event o
![Page 78: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/78.jpg)
node_id
0
1
Id 0, event 0
Id 0, event 1
Id 1, event 0
Id 0, event 2
Id 2, event 0
Id 0, event 3
Id 0, event 0
Id 0, event 1
Id 1, event 0
Id 2, event 0
Id 0, event 2
Id 0, event 3 tag 1 0
allIds
Id 0, event 1
Id 2, event 1
0 1
0 0 event 0 event 1
![Page 79: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/79.jpg)
tag 1 0
allIds
Id 0, event 1
Id 2, event 1
0 1
0 0 event 0 event 1
val boundStatements = statementGroup(eventsByPersistenceId, eventsByTag, allPersistenceIds)
Future.sequence(boundStatements).flatMap { stmts => val batch = new BatchStatement().setConsistencyLevel(...).setRetryPolicy(...) stmts.foreach(batch.add) session.underlying().flatMap(_.executeAsync(batch))}
![Page 80: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/80.jpg)
tag 1 0
allIds
Id 0, event 1
Id 2, event 1
0 1
0 0 event 0 event 1
val boundStatements = statementGroup(eventsByPersistenceId, eventsByTag, allPersistenceIds)
Future.sequence(boundStatements).flatMap { stmts => val batch = new BatchStatement().setConsistencyLevel(...).setRetryPolicy(...) stmts.foreach(batch.add) session.underlying().flatMap(_.executeAsync(batch))}
![Page 81: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/81.jpg)
val eventsByPersistenceIdStatement = statementGroup(eventsByPersistenceIdStatement)val boundStatements = statementGroup(eventsByTagStatement, allPersistenceIdsStatement)...session.underlying().flatMap { s => val ebpResult = s.executeAsync(eventsByPersistenceIdStatement) val batchResult = s.executeAsync(batch)) ...}
tag 1 0
allIds
Id 0, event 1
Id 2, event 1
0 1
0 0 event 0 event 1
![Page 82: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/82.jpg)
val eventsByPersistenceIdStatement = statementGroup(eventsByPersistenceIdStatement)val boundStatements = statementGroup(eventsByTagStatement, allPersistenceIdsStatement)...session.underlying().flatMap { s => val ebpResult = s.executeAsync(eventsByPersistenceIdStatement) val batchResult = s.executeAsync(batch)) ...}
tag 1 0
allIds
Id 0, event 1
Id 2, event 1
0 1
0 0 event 0 event 1
![Page 83: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/83.jpg)
Event time processing● Ingestion time, processing time, event time
![Page 84: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/84.jpg)
![Page 85: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/85.jpg)
Ordering
10 2
1 12:34:57 1
KEY TIME VALUE
2 12:34:58 2
KEY TIME VALUE
0 12:34:56 0
KEY TIME VALUE
![Page 86: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/86.jpg)
0
1
21 12:34:57 1
KEY TIME VALUE
2 12:34:58 2
KEY TIME VALUE
0 12:34:56 0
KEY TIME VALUE
![Page 87: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/87.jpg)
Distributed causal stream merging
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 00
1Id 2,event 0
Id 0,event 3
node_id
![Page 88: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/88.jpg)
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 00
1Id 2,event 0
Id 0,event 3
Id 0,event 0
node_id
![Page 89: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/89.jpg)
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 00
1Id 2,event 0
Id 0,event 3
Id 0,event 0
node_id
![Page 90: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/90.jpg)
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 00
1Id 2,event 0
Id 0,event 3
Id 0,event 0
node_id
persistence_id
seq
0 0
1 . . .
2 . . .
![Page 91: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/91.jpg)
persistence_id
seq
0 1
1 . . .
2 . . .
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 0
node_id
0
1Id 2,event 0
Id 0,event 0
Id 0,event 1
Id 0,event 3
![Page 92: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/92.jpg)
persistence_id
seq
0 2
1 0
2 0Id 0,event 1
Id 0,event 0
Id 1,event 0
node_id
0
1Id 2,event 0
Id 0,event 0
Id 0,event 1
Id 0,event 2
Id 0,event 3
Id 2,event 0
Id 0,event 2
Id 1,event 0
![Page 93: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/93.jpg)
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 00
1Id 2,event 0
Id 0,event 3
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
Id 0,event 3
node_id
Id 1,event 0
persistence_id
seq
0 3
1 0
2 0
![Page 94: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/94.jpg)
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 00
1 Id 2,event 0
Id 0,event 3
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
node_id
Id 1,event 0 0 0 Id 0,
event 0Id 0,event 1
Replay
![Page 95: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/95.jpg)
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 00
1 Id 2,event 0
Id 0,event 3
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
node_id
Id 1,event 0 0 0 Id 0,
event 0Id 0,event 1
![Page 96: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/96.jpg)
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 00
1 Id 2,event 0
Id 0,event 3
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
Id 1,event 0 0 0 Id 0,
event 0Id 0,event 1
node_id
![Page 97: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/97.jpg)
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 00
1 Id 2,event 0
Id 0,event 3
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
Id 1,event 0 0 0 Id 0,
event 0Id 0,event 1
node_id
persistence_id
seq
0 2
![Page 98: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/98.jpg)
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 00
Id 2,event 0
Id 0,event 3
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
Id 1,event 0 0 0 Id 0,
event 0Id 0,event 1
persistence_id
seq
0 2
stream_id seq
0 1
1 2
1
node_id
![Page 99: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/99.jpg)
Exactly once delivery
![Page 100: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/100.jpg)
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
Id 0,event 3
Id 1,event 0
![Page 101: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/101.jpg)
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
Id 0,event 3
Id 1,event 0
![Page 102: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/102.jpg)
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
Id 0,event 3
Id 1,event 0
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 3
Id 1,event 0
ACK ACK ACK ACK ACK
![Page 103: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/103.jpg)
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
Id 0,event 3
Id 1,event 0
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 3
Id 1,event 0
ACK ACK ACK ACK ACK
![Page 104: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/104.jpg)
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
Id 0,event 3
Id 1,event 0
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 3
Id 1,event 0
ACK ACK ACK ACK ACK
![Page 105: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/105.jpg)
Checkpoint data
StateBackend
Source 1: 6791Source 2: 7252Source 3: 5589Source 4: 6843
State 1: ptr 1State 1: ptr 2Sink 2: ack!Sink 2: ack!
![Page 106: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/106.jpg)
class KafkaSource(private var offsetManagers: Map[TopicAndPartition, KafkaOffsetManager]) extends TimeReplayableSource { def open(context: TaskContext, startTime: Option[TimeStamp]): Unit = { fetch.setStartOffset(topicAndPartition, offsetManager.resolveOffset(time)) ... } def read(batchSize: Int): List[Message] def close(): Unit}
![Page 107: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/107.jpg)
class KafkaSource(private var offsetManagers: Map[TopicAndPartition, KafkaOffsetManager]) extends TimeReplayableSource { def open(context: TaskContext, startTime: Option[TimeStamp]): Unit = { fetch.setStartOffset(topicAndPartition, offsetManager.resolveOffset(time)) ... } def read(batchSize: Int): List[Message] def close(): Unit}
![Page 108: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/108.jpg)
class DirectKafkaInputDStream[K, V, U <: Decoder[K]: ClassTag, T <: Decoder[V]: ClassTag, R]( _ssc: StreamingContext, val kafkaParams: Map[String, String], val fromOffsets: Map[TopicAndPartition, Long], messageHandler: MessageAndMetadata[K, V] => R ) extends InputDStream[R](_ssc) with Logging {
override def compute(validTime: Time): Option[KafkaRDD[K, V, U, T, R]] = { val untilOffsets = latestLeaderOffsets(maxRetries) ... }}
![Page 109: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/109.jpg)
class DirectKafkaInputDStream[K, V, U <: Decoder[K]: ClassTag, T <: Decoder[V]: ClassTag, R]( _ssc: StreamingContext, val kafkaParams: Map[String, String], val fromOffsets: Map[TopicAndPartition, Long], messageHandler: MessageAndMetadata[K, V] => R ) extends InputDStream[R](_ssc) with Logging {
override def compute(validTime: Time): Option[KafkaRDD[K, V, U, T, R]] = { val untilOffsets = latestLeaderOffsets(maxRetries) ... }}
![Page 110: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/110.jpg)
Exactly once delivery● Durable offset
0 1 2 3 4
![Page 111: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/111.jpg)
0 1 2 3 4
![Page 112: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/112.jpg)
10 2 3 4
![Page 113: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/113.jpg)
10 3 42
![Page 114: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/114.jpg)
Stream source
Stream source
Stream source
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
select
map filter
filtermap
select
select
sele
ct
Optimisation
![Page 115: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/115.jpg)
Worker
Worker
Worker
Worker
select where
select where
WorkerStream source
Stream source
Stream source
select where
select where
![Page 116: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/116.jpg)
Worker
Worker
Workerselect where
select where
Stream source
Stream source
Stream source select where
select where
select where
select where
![Page 117: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/117.jpg)
val partitioner = partitionerClassName match { case "org.apache.cassandra.dht.Murmur3Partitioner" => Murmur3TokenFactory case "org.apache.cassandra.dht.RandomPartitioner" => RandomPartitionerTokenFactory case _ => throw new IllegalArgumentException(s"Unsupported partitioner: $partitionerClassName") }
private def splitToCqlClause(range: TokenRange): Iterable[CqlTokenRange] = { if (range.end == tokenFactory.minToken) List(CqlTokenRange(s"token($pk) > ?", startToken)) else if (range.start == tokenFactory.minToken) List(CqlTokenRange(s"token($pk) <= ?", endToken)) else if (!range.isWrapAround) List(CqlTokenRange(s"token($pk) > ? AND token($pk) <= ?", startToken, endToken)) else List( CqlTokenRange(s"token($pk) > ?", startToken), CqlTokenRange(s"token($pk) <= ?", endToken))}
![Page 118: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/118.jpg)
val partitioner = partitionerClassName match { case "org.apache.cassandra.dht.Murmur3Partitioner" => Murmur3TokenFactory case "org.apache.cassandra.dht.RandomPartitioner" => RandomPartitionerTokenFactory case _ => throw new IllegalArgumentException(s"Unsupported partitioner: $partitionerClassName") }
private def splitToCqlClause(range: TokenRange): Iterable[CqlTokenRange] = { if (range.end == tokenFactory.minToken) List(CqlTokenRange(s"token($pk) > ?", startToken)) else if (range.start == tokenFactory.minToken) List(CqlTokenRange(s"token($pk) <= ?", endToken)) else if (!range.isWrapAround) List(CqlTokenRange(s"token($pk) > ? AND token($pk) <= ?", startToken, endToken)) else List( CqlTokenRange(s"token($pk) > ?", startToken), CqlTokenRange(s"token($pk) <= ?", endToken))}
![Page 119: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/119.jpg)
val partitioner = partitionerClassName match { case "org.apache.cassandra.dht.Murmur3Partitioner" => Murmur3TokenFactory case "org.apache.cassandra.dht.RandomPartitioner" => RandomPartitionerTokenFactory case _ => throw new IllegalArgumentException(s"Unsupported partitioner: $partitionerClassName") }
private def splitToCqlClause(range: TokenRange): Iterable[CqlTokenRange] = { if (range.end == tokenFactory.minToken) List(CqlTokenRange(s"token($pk) > ?", startToken)) else if (range.start == tokenFactory.minToken) List(CqlTokenRange(s"token($pk) <= ?", endToken)) else if (!range.isWrapAround) List(CqlTokenRange(s"token($pk) > ? AND token($pk) <= ?", startToken, endToken)) else List( CqlTokenRange(s"token($pk) > ?", startToken), CqlTokenRange(s"token($pk) <= ?", endToken))}
![Page 120: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/120.jpg)
override def getPreferredLocations(split: Partition): Seq[String] = split.asInstanceOf[CassandraPartition].endpoints.flatMap(nodeAddresses.hostNames).toSeq
override def getPartitions: Array[Partition] = { val partitioner = CassandraRDDPartitioner(connector, tableDef, splitCount, splitSize) val partitions = partitioner.partitions(where) partitions}
override def compute(split: Partition, context: TaskContext): Iterator[R] = { val session = connector.openSession() val partition = split.asInstanceOf[CassandraPartition] val tokenRanges = partition.tokenRanges val metricsUpdater = InputMetricsUpdater(context, readConf)
val rowIterator = tokenRanges.iterator.flatMap( fetchTokenRange(session, _, metricsUpdater))
new CountingIterator(rowIterator, limit)}
![Page 121: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/121.jpg)
override def getPreferredLocations(split: Partition): Seq[String] = split.asInstanceOf[CassandraPartition].endpoints.flatMap(nodeAddresses.hostNames).toSeq
override def getPartitions: Array[Partition] = { val partitioner = CassandraRDDPartitioner(connector, tableDef, splitCount, splitSize) val partitions = partitioner.partitions(where) partitions}
override def compute(split: Partition, context: TaskContext): Iterator[R] = { val session = connector.openSession() val partition = split.asInstanceOf[CassandraPartition] val tokenRanges = partition.tokenRanges val metricsUpdater = InputMetricsUpdater(context, readConf)
val rowIterator = tokenRanges.iterator.flatMap( fetchTokenRange(session, _, metricsUpdater))
new CountingIterator(rowIterator, limit)}
![Page 122: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/122.jpg)
object PushPredicateThroughProject extends Rule[LogicalPlan] with PredicateHelper { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case filter @ Filter(condition, project @ Project(fields, grandChild)) if fields.forall(_.deterministic) =>
val aliasMap = AttributeMap(fields.collect { case a: Alias => (a.toAttribute, a.child) })
project.copy(child = Filter(replaceAlias(condition, aliasMap), grandChild)) }}
![Page 123: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/123.jpg)
object PushPredicateThroughProject extends Rule[LogicalPlan] with PredicateHelper { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case filter @ Filter(condition, project @ Project(fields, grandChild)) if fields.forall(_.deterministic) =>
val aliasMap = AttributeMap(fields.collect { case a: Alias => (a.toAttribute, a.child) })
project.copy(child = Filter(replaceAlias(condition, aliasMap), grandChild)) }}
![Page 124: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/124.jpg)
Table and stream duality
14
35
2
![Page 125: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/125.jpg)
Table and stream duality
14
35
2
1 State X
![Page 126: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/126.jpg)
1 Id 0Event 1
Table and stream duality
14
35
2
1 State X
Id 0Event 2
Id 0Event 1
![Page 127: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/127.jpg)
Snapshot for offset N
Table and stream duality
14
35
2
1 Id 0Event 1
1 State X
Id 0Event 2
Id 0Event 1
4
![Page 128: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/128.jpg)
Table and stream duality
Snapshot for offset N
14
35
2
1 Id 0Event 1
1 State X
Id 0Event 2
Id 0Event 1
4
NId 0Offset 123State X
Id 11Offset 123State X
![Page 129: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/129.jpg)
Cache / view / index / replica / system / service
Continuous stream applying transformation function
Updates to the source of truth data
Original table
Infinite streams application
![Page 130: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/130.jpg)
internet
services
devices
social
Kafka Stream processing
apps
Stream consumer
Search
Apps
Services
Databases
Batch
Batch
Serialisation
![Page 131: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/131.jpg)
Distributed systems
User
Mobile
System
Microservice
Microservice
MicroserviceMicroservice Microservice Microservice
Microservice
CQRS/ES Relational NoSQL
![Page 132: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/132.jpg)
Client 1
Client 2
Client 3
Update
Update
UpdateModel devices Model devices Model devices
Input data Input data Input data
Parameter devices
P
ΔP
ΔP
ΔP
![Page 133: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/133.jpg)
Challenges
● All the solved problems○ Exactly once delivery○ Consistency○ Availability○ Fault tolerance○ Cross service invariants and consistency○ Transactions○ Automated deployment and configuration management○ Serialization, versioning, compatibility○ Automated elasticity○ No downtime version upgrades○ Graceful shutdown of nodes○ Distributed system verification, logging, tracing, monitoring, debugging○ Split brains○ ...
![Page 134: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/134.jpg)
Conclusion
● From request, response, synchronous, mutable state● To streams, asynchronous messaging
● Production ready distributed systems
![Page 135: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/135.jpg)
MANCHESTER LONDON NEW YORK
Questions
![Page 136: Data in Motion: Streaming Static Data Efficiently](https://reader034.fdocuments.net/reader034/viewer/2022051101/586f781e1a28ab10258b6a5d/html5/thumbnails/136.jpg)
MANCHESTER LONDON NEW YORK
@zapletal_martin @cakesolutions
347 708 1518
We are hiringhttp://www.cakesolutions.net/careers