Building a High-Performance Database with Scala, Akka, and Spark
date post
21-Apr-2017Category
Engineering
view
850download
7
Embed Size (px)
Transcript of Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with
Scala, Akka, and SparkEvan Chan
Who am I
User and contributor to Spark since 0.9, Cassandra since 0.6 Created Spark Job Server and FiloDB Talks at Spark Summit, Cassandra Summit, Strata, Scala Days, etc. http://velvia.github.io/
http://github.com/spark-jobserver/spark-jobserverhttp://github.com/filodb/FiloDBhttp://velvia.github.io/
Streaming is now King
Message Queue
EventsStream
Processing Layer
State / Database
Happy Users
Why are Updates Important?Appends
Streaming workloads. Add new data continuously.
Real data is *always* changing. Queries on live real-time data has business benefits.
Updates
Idempotency = really simple ingestion pipelines
Simpler streaming later
update late events (See Spark 2.0 Structured Streaming)
Introducing FiloDBA distributed, versioned, columnar analytics database. With updates. Built for streaming.
http://www.github.com/filodb/FiloDB
http://www.github.com/filodb/FiloDB
Fast Analytics Storage Scan speeds competitive with Apache Parquet
In-memory version significantly faster
Flexible filtering along two dimensions
Much more efficient and flexible partition key filtering
Efficient columnar storage using dictionary encoding and other techniques
Updatable
Spark SQL for easy BI integration
Message Queue
EventsSpark
Streaming
Short term storage, K-V
Adhoc, SQL, ML
Cassandra
FiloDB: Events, ad-hoc, batch
Spark
Dashboards, maps
100% Reactive Scala
Akka Cluster
Spark
Typesafe Config for all configuration
Scodec, Ficus, Enumeratum, Scalactic, etc.
Even most of the performance critical parts are written in Scala :)
Scala, Akka, and Spark Akka - eliminate shared mutable state
Remote and cluster makes building distributed client-server architectures easy
Backpressure, at-least-once is easy to build
Failure handling and supervision are critical for databases
Spark for SQL, DataFrames, ML, interfacing
One FiloDB Node
NodeCoordinatorActor (NCA)
DatasetCoordinatorActor (DsCA)
DatasetCoordinatorActor (DsCA)
Active MemTable
Flushing MemTableReprojector ColumnStore
Data, commands
Akka vs Futures
NodeCoordinatorActor (NCA)
DatasetCoordinatorActor (DsCA)
DatasetCoordinatorActor (DsCA)
Active MemTable
Flushing MemTableReprojector ColumnStore
Data, commands
Akka - control flow
Core I/O - Futures
Akka vs Futures Akka Actors:
External FiloDB node API (remote + cluster)
Async messaging with clients
State management and scheduling (flushing)
Futures:
Core I/O
Columnar data processing / ingestion
Type-safe processing stages
Akka for Control FlowDriver
Client
Executor
NCA
DsCA1 DsCA2
Executor
NCA
DsCA1 DsCA2
Flush()
NodeClusterActor
SingletonClusterProxy
Yes, Akka in Spark Columnar ingestion is stateful - need stickiness of state. This
is inherently difficult in Spark.
Akka (cluster) gives us a separate, asynchronous control channel to talk to FiloDB ingestors
Spark only gives data flow primitives, not async messaging
We need to route incoming records to the correct ingestion node. Sorting data is inefficient and forces all nodes to wait for sorting to be done.
On failure, can control state recovery and moving state
Data Ingestion SetupExecutor
NCA
DsCA1 DsCA2
task0 task1
Row Source Actor
Row Source Actor
Executor
NCA
DsCA1 DsCA2
task0 task1
Row Source Actor
Row Source Actor
Node Cluster Actor
Partition Map
FiloDB NodeFiloDB Node
FiloDB separate nodesExecutor
NCA
DsCA1 DsCA2
task0 task1
Row Source Actor
Row Source Actor
Executor
NCA
DsCA1 DsCA2
task0 task1
Row Source Actor
Row Source Actor
Node Cluster Actor
Partition Map
Akka wire protocol
Backpressure Assumes receiver is OK, starts sending rows
Allows configurable number of unacked messages before stops sending
Acking is receivers way of rate-limiting
Automatic retries for at-least-once
NACK for when receiver must stop (out of memory or MemTable full)
Testing Akka Cluster MultiNodeSpec / sbt-multi-jvm
AWESOME
Test multi-node message routing
Test cluster membership and subscription
Inject network failures
Core: All Futures /** * Clears all data from the column store for that given projection, for all versions. * More like a truncation, not a drop. * NOTE: please make sure there are no reprojections or writes going on before calling this */ def clearProjectionData(projection: Projection): Future[Response]
/** * Completely and permanently drops the dataset from the column store. * @param dataset the DatasetRef for the dataset to drop. */ def dropDataset(dataset: DatasetRef): Future[Response]
/** * Appends the ChunkSets and incremental indices in the segment to the column store. * @param segment the ChunkSetSegment to write / merge to the columnar store * @param version the version # to write the segment to * @return Success. Future.failure(exception) otherwise. */ def appendSegment(projection: RichProjection, segment: ChunkSetSegment, version: Int): Future[Response]
Kamon Tracing def appendSegment(projection: RichProjection, segment: ChunkSetSegment, version: Int): Future[Response] = Tracer.withNewContext("append-segment") { val ctx = Tracer.currentContext stats.segmentAppend() if (segment.chunkSets.isEmpty) { stats.segmentEmpty() return(Future.successful(NotApplied)) } for { writeChunksResp responses.head } } }
Kamon Tracing http://kamon.io
One trace can encapsulate multiple Future steps all executing on different threads
Tunable tracing levels
Summary stats and histograms for segments
Super useful for production debugging of reactive stack
http://kamon.io
Kamon Metrics
Uses HDRHistogram for much finer and more accurate buckets
Built-in metrics for Akka actors, Spray, Akka-Http, Play, etc. etc.
KAMON trace name=append-segment n=2863 min=765952 p50=2113536 p90=3211264 p95=3981312 p99=9895936 p999=16121856 max=19529728KAMON trace-segment name=write-chunks n=2864 min=436224 p50=1597440 p90=2637824 p95=3424256 p99=9109504 p999=15335424 max=18874368KAMON trace-segment name=write-index n=2863 min=278528 p50=432128 p90=544768 p95=598016 p99=888832 p999=2260992 max=8355840
Validation: Scalactic private def getColumnsFromNames(allColumns: Seq[Column], columnNames: Seq[String]): Seq[Column] Or BadSchema = { if (columnNames.isEmpty) { Good(allColumns) } else { val columnMap = allColumns.map { c => c.name -> c }.toMap val missing = columnNames.toSet -- columnMap.keySet if (missing.nonEmpty) { Bad(MissingColumnNames(missing.toSeq, "projection")) } else { Good(columnNames.map(columnMap)) } } }
for { computedColumns
Machine-Speed Scalahttp://github.com/velvia/filo
https://github.com/filodb/FiloDB/blob/new-storage-format/core/src/main/scala/filodb.core/binaryrecord/BinaryRecord.scala
http://github.com/velvia/filohttps://github.com/filodb/FiloDB/blob/new-storage-format/core/src/main/scala/filodb.core/binaryrecord/BinaryRecord.scala
Filo: High Performance Binary Vectors
Designed for NoSQL, not a file format
random or linear access
on or off heap
missing value support
Scala only, but cross-platform support possible
http://github.com/velvia/filo is a binary data vector library designed for extreme read performance with minimal deserialization costs.
http://github.com/velvia/filo
Billions of Ops / Sec
JMH benchmark: 0.5ns per FiloVector element access / add
2 Billion adds per second - single threaded
Who said Scala cannot be fast?
Spark API (row-based) limits performance significantly
val randomInts = (0 until numValues).map(i => util.Random.nextInt) val randomIntsAray = randomInts.toArray val filoBuffer = VectorBuilder(randomInts).toFiloBuffer val sc = FiloVector[Int](filoBuffer) @Benchmark @BenchmarkMode(Array(Mode.AverageTime)) @OutputTimeUnit(TimeUnit.MICROSECONDS) def sumAllIntsFiloApply(): Int = { var total = 0 for { i
Thank you Scala OSS!