Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine...

25
Doing It Live Machine learning at scale, and in an instant

Transcript of Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine...

Page 1: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

Doing It Live Machine learning at scale, and in an instant

Page 2: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

Christopher Smith

● VP of Engineering, Data Science

[email protected]

● FanGraph - Single view of the fan

Page 3: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

3

Visible Impact

Page 4: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

Sell fun at scale

Page 5: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

Sell out a stadium …in minutes

Page 6: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

▪Massive Traffic Spikes

▪Real Time Data at Scale

▪Separating Fans from Bots

Data Science?

Doing it live 6

Page 7: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

7

Page 8: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

8 Data Sets

Engage-

ment

Entitle-

ments

Access

Scans 10 Inventory

Avails

20 Inventory

Avails Payment

Member

Account

In

Venue Social

Event

Meta

Settle-

ment

BC Distributed

Commerce

Recos Abuse

Prevention

Resale

Fraud

Marketing &

Sponsorship

Winback Customer

Service

Analytics Verified Fan

Data From Everywhere… to Everywhere

Page 9: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

9

Core Principles

• Source systems have one job: publish mutations/data

• Consumers don’t impact producers

• Process data one tuple at a time, in a FP-like fashion

• Data is available everywhere

• Data is always archived

• Organize data per project’s needs

Page 10: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

10

Core Components

• Kafka: reliable, scalable data transit – one cluster per DC

• CKAN: data discovery

• Avro: efficient, extensible serialization w/schema

• Secor: archiving

• Storm/Trident: complex lambda processing

• Vowpal Wabbit: fast, online machine learning

• ElasticSearch: flexible document search

Page 11: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

11

CKAN

Page 12: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

12

Kafka

Page 13: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

13

Avro

Page 14: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

14

Secor

• Highly scalable

• Aggregate Kafka topic data by time

• Flexible output formats (sequence files, parquet, csv)

• Compression

• Store in s3 (cheap)

Page 15: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

15

Storm/Trident

Page 16: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

16

Storm/Trident

public class MyFunction extends BaseFunction {

public void execute(TridentTuple tuple, TridentCollector collector) { … }

}

public class MyFilter extends BaseFilter {

public boolean isKeep(TridentTuple tuple) { … }

}

Page 17: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

17

Storm/Trident w/Kafka

Page 18: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

18

Vowpal Wabbit

• Online/out-of-core learning

• Extraordinarily fast – handles large sparse feature spaces

• Broad selection of algorithms

• Scalable (all reduce)

• Active & reinforcement learning

• Contextual bandit FTW!

Page 19: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

19

Elastic Search

• Generic, sharded & redundant search engine

• Real-time indexing

• Distinct master, data, ingest, coordination node roles

• Handles semi-structured data

• Kibana: secret weapon

Page 20: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

20

Kibana

Page 21: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

21

Abuse Prevention

Page 22: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

22

Example - Analytics

Page 23: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

FanGraph

Auto Delist on Entry

Page 24: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions

24

Lambda Architecture FTW!

• Reusuable functions

• Dev team autonomy/loose coupling

• Simplifies handling failure

• Simplifies scaling

• Complex computation with low latency

• Even less computationally intensive tasks become easier

Page 25: Doing It Live - SCALE 16x · PDF fileElastic Search Generic, sharded & redundant search engine Real-time indexing Distinct master, ... Lambda Architecture FTW! Reusuable functions