Building a High-Performance Database with Scala, Akka, and Spark

29
Building a High- Performance Database with Scala, Akka, and Spark Evan Chan

Transcript of Building a High-Performance Database with Scala, Akka, and Spark

Page 1: Building a High-Performance Database with Scala, Akka, and Spark

Building a High-Performance Database with

Scala, Akka, and SparkEvan Chan

Page 2: Building a High-Performance Database with Scala, Akka, and Spark

Who am I

User and contributor to Spark since 0.9, Cassandra since 0.6 Created Spark Job Server and FiloDB Talks at Spark Summit, Cassandra Summit, Strata, Scala Days, etc. http://velvia.github.io/

Page 3: Building a High-Performance Database with Scala, Akka, and Spark

Streaming is now King

Page 4: Building a High-Performance Database with Scala, Akka, and Spark

Message Queue

EventsStream

Processing Layer

State / Database

Happy Users

Page 5: Building a High-Performance Database with Scala, Akka, and Spark

Why are Updates Important?Appends

Streaming workloads. Add new data continuously.

Real data is *always* changing. Queries on live real-time data has business benefits.

Updates

Idempotency = really simple ingestion pipelines

Simpler streaming later

update late events (See Spark 2.0 Structured Streaming)

Page 6: Building a High-Performance Database with Scala, Akka, and Spark

Introducing FiloDBA distributed, versioned, columnar analytics database. With updates. Built for streaming.

http://www.github.com/filodb/FiloDB

Page 7: Building a High-Performance Database with Scala, Akka, and Spark

Fast Analytics Storage• Scan speeds competitive with Apache Parquet

• In-memory version significantly faster

• Flexible filtering along two dimensions

• Much more efficient and flexible partition key filtering

• Efficient columnar storage using dictionary encoding and other techniques

• Updatable

• Spark SQL for easy BI integration

Page 8: Building a High-Performance Database with Scala, Akka, and Spark

Message Queue

EventsSpark

Streaming

Short term storage, K-V

Adhoc, SQL, ML

Cassandra

FiloDB: Events, ad-hoc, batch

Spark

Dashboards, maps

Page 9: Building a High-Performance Database with Scala, Akka, and Spark

100% Reactive• Scala

• Akka Cluster

• Spark

• Typesafe Config for all configuration

• Scodec, Ficus, Enumeratum, Scalactic, etc.

• Even most of the performance critical parts are written in Scala :)

Page 10: Building a High-Performance Database with Scala, Akka, and Spark

Scala, Akka, and Spark• Akka - eliminate shared mutable state

• Remote and cluster makes building distributed client-server architectures easy

• Backpressure, at-least-once is easy to build

• Failure handling and supervision are critical for databases

• Spark for SQL, DataFrames, ML, interfacing

Page 11: Building a High-Performance Database with Scala, Akka, and Spark

One FiloDB Node

NodeCoordinatorActor (NCA)

DatasetCoordinatorActor (DsCA)

DatasetCoordinatorActor (DsCA)

Active MemTable

Flushing MemTableReprojector ColumnStore

Data, commands

Page 12: Building a High-Performance Database with Scala, Akka, and Spark

Akka vs Futures

NodeCoordinatorActor (NCA)

DatasetCoordinatorActor (DsCA)

DatasetCoordinatorActor (DsCA)

Active MemTable

Flushing MemTableReprojector ColumnStore

Data, commands

Akka - control flow

Core I/O - Futures

Page 13: Building a High-Performance Database with Scala, Akka, and Spark

Akka vs Futures• Akka Actors:

• External FiloDB node API (remote + cluster)

• Async messaging with clients

• State management and scheduling (flushing)

• Futures:

• Core I/O

• Columnar data processing / ingestion

• Type-safe processing stages

Page 14: Building a High-Performance Database with Scala, Akka, and Spark

Akka for Control FlowDriver

Client

Executor

NCA

DsCA1 DsCA2

Executor

NCA

DsCA1 DsCA2

Flush()

NodeClusterActor

SingletonClusterProxy

Page 15: Building a High-Performance Database with Scala, Akka, and Spark

Yes, Akka in Spark• Columnar ingestion is stateful - need stickiness of state. This

is inherently difficult in Spark.

• Akka (cluster) gives us a separate, asynchronous control channel to talk to FiloDB ingestors

• Spark only gives data flow primitives, not async messaging

• We need to route incoming records to the correct ingestion node. Sorting data is inefficient and forces all nodes to wait for sorting to be done.

• On failure, can control state recovery and moving state

Page 16: Building a High-Performance Database with Scala, Akka, and Spark

Data Ingestion SetupExecutor

NCA

DsCA1 DsCA2

task0 task1

Row Source Actor

Row Source Actor

Executor

NCA

DsCA1 DsCA2

task0 task1

Row Source Actor

Row Source Actor

Node Cluster Actor

Partition Map

Page 17: Building a High-Performance Database with Scala, Akka, and Spark

FiloDB NodeFiloDB Node

FiloDB separate nodesExecutor

NCA

DsCA1 DsCA2

task0 task1

Row Source Actor

Row Source Actor

Executor

NCA

DsCA1 DsCA2

task0 task1

Row Source Actor

Row Source Actor

Node Cluster Actor

Partition Map

Page 18: Building a High-Performance Database with Scala, Akka, and Spark

Akka wire protocol

Page 19: Building a High-Performance Database with Scala, Akka, and Spark

Backpressure• Assumes receiver is OK, starts sending rows

• Allows configurable number of unacked messages before stops sending

• Acking is receiver’s way of rate-limiting

• Automatic retries for at-least-once

• NACK for when receiver must stop (out of memory or MemTable full)

Page 20: Building a High-Performance Database with Scala, Akka, and Spark

Testing Akka Cluster• MultiNodeSpec / sbt-multi-jvm

• AWESOME

• Test multi-node message routing

• Test cluster membership and subscription

• Inject network failures

Page 21: Building a High-Performance Database with Scala, Akka, and Spark

Core: All Futures /** * Clears all data from the column store for that given projection, for all versions. * More like a truncation, not a drop. * NOTE: please make sure there are no reprojections or writes going on before calling this */ def clearProjectionData(projection: Projection): Future[Response]

/** * Completely and permanently drops the dataset from the column store. * @param dataset the DatasetRef for the dataset to drop. */ def dropDataset(dataset: DatasetRef): Future[Response]

/** * Appends the ChunkSets and incremental indices in the segment to the column store. * @param segment the ChunkSetSegment to write / merge to the columnar store * @param version the version # to write the segment to * @return Success. Future.failure(exception) otherwise. */ def appendSegment(projection: RichProjection, segment: ChunkSetSegment, version: Int): Future[Response]

Page 22: Building a High-Performance Database with Scala, Akka, and Spark

Kamon Tracing def appendSegment(projection: RichProjection, segment: ChunkSetSegment, version: Int): Future[Response] = Tracer.withNewContext("append-segment") { val ctx = Tracer.currentContext stats.segmentAppend() if (segment.chunkSets.isEmpty) { stats.segmentEmpty() return(Future.successful(NotApplied)) } for { writeChunksResp <- writeChunks(projection.datasetRef, version, segment, ctx) writeIndexResp <- writeIndices(projection, version, segment, ctx) if writeChunksResp == Success } yield { ctx.finish() writeIndexResp } }

private def writeChunks(dataset: DatasetRef, version: Int, segment: ChunkSetSegment, ctx: TraceContext): Future[Response] = { asyncSubtrace(ctx, "write-chunks", "ingestion") { val binPartition = segment.binaryPartition val segmentId = segment.segmentId val chunkTable = getOrCreateChunkTable(dataset) Future.traverse(segment.chunkSets) { chunkSet => chunkTable.writeChunks(binPartition, version, segmentId, chunkSet.info.id, chunkSet.chunks, stats) }.map { responses => responses.head } } }

Page 23: Building a High-Performance Database with Scala, Akka, and Spark

Kamon Tracing• http://kamon.io

• One trace can encapsulate multiple Future steps all executing on different threads

• Tunable tracing levels

• Summary stats and histograms for segments

• Super useful for production debugging of reactive stack

Page 24: Building a High-Performance Database with Scala, Akka, and Spark

Kamon Metrics

• Uses HDRHistogram for much finer and more accurate buckets

• Built-in metrics for Akka actors, Spray, Akka-Http, Play, etc. etc.

KAMON trace name=append-segment n=2863 min=765952 p50=2113536 p90=3211264 p95=3981312 p99=9895936 p999=16121856 max=19529728KAMON trace-segment name=write-chunks n=2864 min=436224 p50=1597440 p90=2637824 p95=3424256 p99=9109504 p999=15335424 max=18874368KAMON trace-segment name=write-index n=2863 min=278528 p50=432128 p90=544768 p95=598016 p99=888832 p999=2260992 max=8355840

Page 25: Building a High-Performance Database with Scala, Akka, and Spark

Validation: Scalactic private def getColumnsFromNames(allColumns: Seq[Column], columnNames: Seq[String]): Seq[Column] Or BadSchema = { if (columnNames.isEmpty) { Good(allColumns) } else { val columnMap = allColumns.map { c => c.name -> c }.toMap val missing = columnNames.toSet -- columnMap.keySet if (missing.nonEmpty) { Bad(MissingColumnNames(missing.toSeq, "projection")) } else { Good(columnNames.map(columnMap)) } } }

for { computedColumns <- getComputedColumns(dataset.name, allColIds, columns) dataColumns <- getColumnsFromNames(columns, normProjection.columns) richColumns = dataColumns ++ computedColumns // scalac has problems dealing with (a, b, c) <- getColIndicesAndType... apparently segStuff <- getColIndicesAndType(richColumns, Seq(normProjection.segmentColId), "segment") keyStuff <- getColIndicesAndType(richColumns, normProjection.keyColIds, "row") partStuff <- getColIndicesAndType(richColumns, dataset.partitionColumns, "partition") } yield {

• Notice how multiple validations compose!

Page 26: Building a High-Performance Database with Scala, Akka, and Spark

Machine-Speed Scalahttp://github.com/velvia/filo

https://github.com/filodb/FiloDB/blob/new-storage-format/core/src/main/scala/filodb.core/binaryrecord/BinaryRecord.scala

Page 27: Building a High-Performance Database with Scala, Akka, and Spark

Filo: High Performance Binary Vectors

• Designed for NoSQL, not a file format

• random or linear access

• on or off heap

• missing value support

• Scala only, but cross-platform support possible

http://github.com/velvia/filo is a binary data vector library designed for extreme read performance with minimal deserialization costs.

Page 28: Building a High-Performance Database with Scala, Akka, and Spark

Billions of Ops / Sec

• JMH benchmark: 0.5ns per FiloVector element access / add

• 2 Billion adds per second - single threaded

• Who said Scala cannot be fast?

• Spark API (row-based) limits performance significantly

val randomInts = (0 until numValues).map(i => util.Random.nextInt) val randomIntsAray = randomInts.toArray val filoBuffer = VectorBuilder(randomInts).toFiloBuffer val sc = FiloVector[Int](filoBuffer) @Benchmark @BenchmarkMode(Array(Mode.AverageTime)) @OutputTimeUnit(TimeUnit.MICROSECONDS) def sumAllIntsFiloApply(): Int = { var total = 0 for { i <- 0 until numValues optimized } { total += sc(i) } total }

Page 29: Building a High-Performance Database with Scala, Akka, and Spark

Thank you Scala OSS!