How Spotify scales Apache Storm Pipelines

○○○

○○

○

○

○

○

○

○

○

○

Log EventsApache Kafka

Real-time Personalization Pipeline

Apache Storm

User Profile Store

Apache Cassandra

Entity Metadata Store

Apache Cassandra

○

○

○

○

○

○

○

○

Build v2

Storm Cluster

Running v1

t1 t2

Storm Cluster

t3 t4

Running v1

Running v2

Deactivate v1

Submit v2

Check v2

graphs

Kill v1

Storm Cluster

Running v2

t5 t6 t7 t8

○

○

○

○

○

○

○

○

Pagerduty Inhouse Solution

○

○

○

○

○

○

○

○

○○○○

○

○

○

○

○

○

○

○

Log EventsApache Kafka

Real-time Personalization Pipeline

Apache Storm

User Profile Store

Apache Cassandra

Entity Metadata Store

Apache Cassandra

● Different tables for different TTLs and set gc_grace_period=0. Read repairs are disabled.

● Used DateTieredCompactionStrategy for short lived data.

● Control the number of open connections from Storm topology to Cassandra

● Configure Snitch to ensure proper call routing

● 1 worker per node per topology● 1 executor per core for CPU bound tasks● 1-10 executors per core for IO bound tasks● Compute total parallelism possible and distribute it

amongst slow and fast tasks. High parallelism for slow tasks, low for fast tasks.

* Parallelism tuning inspired by P Taylor Goetz’s Strata 2014 talk

○

○

○

○

○

○

○

○

● Think about constraints in external vs in-process caching○ External Caching

■ Network IO■ Latency■ Another point of failure

○ In-process■ Limited memory■ No persistence

How Spotify scales Apache Storm Pipelines

Data & Analytics

Transcript of How Spotify scales Apache Storm Pipelines