The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

39
The Rise of Digital Audio: Dwelling between BIG Data and Fast Data Philippe-Alexandre Leroux | Chief Operating Officer Bogdan Bocse | Solutions Architect

Transcript of The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

Page 1: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

The Rise of Digital Audio: Dwelling between BIG Data

and Fast Data

Philippe-Alexandre Leroux | Chief Operating OfficerBogdan Bocse | Solutions Architect

Page 2: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

The way we consume music has evolved

Page 3: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

Music is part of our lives, just not like

before

Page 4: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

We can now consume music in many different ways

On Demand Live Radios Custom Radios

Page 5: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

It’s now interactive, connected and tailored

around users… = New opportunities for publishers &

advertisers

So what’s different now?

Page 6: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

What does it mean for the industry?

Less people are buying CDs

Publishers and Artists need new revenue models

Advertisers want to Digital Audio to be as easy as Display or Video+

+

=Great opportunity for an Ad Tech company to power

the Digital Audio Revolution !

Page 7: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

AdsWizz in that ?

Page 8: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

We are NOT an airline

Page 9: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

We power the Digital Audio revolution

Audience Analytics AdServing Audio StreamingSSP

DSP

Real-Time Bidding

Real-Time reports

Supply Intelligence

Content Analysis

Mobile SDKs

Real-Time Ad Insertion

Traffic Forecasting

Page 10: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

Some numbers

#5B +impressions per month#3500+ broadcast stations#10 000 custom stations#1000 podcast shows#100+ Amazon nodes#1+ Million concurrent sessions #100 Swizzers#7 offices world wide

Page 11: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

Some of the cool brands we work with

Page 12: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

How do we use Big Data?* It’s not just for showing off

Page 13: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
Page 14: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

Understand user trends

0:000:30

1:001:30

2:002:30

3:003:30

4:004:30

5:005:30

6:006:30

7:007:30

8:008:30

9:009:30

10:0010:30

11:0011:30

12:0012:30

13:0013:30

14:0014:30

15:0015:30

16:0016:30

17:0017:30

18:0018:30

19:0019:30

20:0020:30

21:0021:30

22:0022:30

23:0023:30

UK Online Listening Media Day

Lunch breakDaily peak

Commute

Page 15: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

Real-time user profiling

Page 16: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

RTB is like the stock exchange, but with ads

Page 17: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
Page 18: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

Traditional “small” data solutions simply don’t work

For every single transaction we collect 20+ data points

Applied to 5+ billion monthly impressions

A database which grows by 1TB per day

Good luck serving close to real-time queries with MySQL

+

=+

Page 19: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

Yeah, yeah, it’s all BIG. What else?

Fast•Cache-Aside Pattern•Redis•Memcached

Complex

Query

•Data Warehousing•Redshift•HadoopStructu

red Query

•Sorted key-value stores•HBase•DynamoDB

Page 20: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
Page 21: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

Use Case #1: Handling User Profiles

Page 22: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

Use Case #2: Distributed Worker

Page 23: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

Use Case #3: Distributed Worker +Data Warehouse

Page 24: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

Use Case #4: Distributed Worker +Data Warehouse + State Store

Page 25: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

An evolving tech stack

Page 26: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

Join the ride

Page 27: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

We are looking for new Swizzers to join BIG DATA ENGINEER FOR DATA SCIENCE TEAM

MAD DEVOPS NINJA

INCIDENT MANAGER

SUPER VILLAIN (ÜBER JAVA DEVELOPER)

SENIOR MOBILE DEVELOPER (ANDROID/iOS)

SENIOR QA INTEGRATION ENGINEER

[email protected]

PHP / AngularJS DEVELOPERSENIOR IT PROJECT MANAGER

Page 28: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
Page 29: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

Philippe-Alexandre LerouxChief Operating [email protected]

Bogdan BocșeSolutions [email protected]

@followadswizz

Page 30: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

Philippe-Alexandre LerouxChief Operating [email protected]

Bogdan BocșeSolutions [email protected]

@followadswizz

Page 31: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

Backup Slides(on the off-chance 20 minutes are enough)

Page 32: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

What’s it called? What does it mean?

Volumetry If it’s less than 100GB, don’t bother calling it BigData

Atomic Query Size Are you reading 10 or 10 million records per transaction?

Query Load Do you expect 5 or 5000 queries per second?

Response Time Do you expect your data store to answer in 1ms, 10ms or 10s?

Immutability Once your data is written, does it stay written?

Strict Consistency Do you need changes to be instantly visible to all readers?

Data Freshness Do you need the absolute latest data, to the millisecond?

ACID Compliance If you work with ordering or payments, you want transactions.

Query Accuracy Is there room for error for the results to your queries?

Persistence/Durability Should data be stored on a permanent medium (HDD, SSD)?

High Availability Is it required that the data stores stays available throughout hardware and network failures?

Page 33: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

Big• Cost grows linearly with data size• No performance degradation with size

Flexible On-the-fly queries

Accurate Exact computationEstimate resultStrict consistency

Fast Fast ReadsFast WritesFast Updates

Cost & Complexity

Page 34: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

Redshift: Queries at Scale• Tables have sort keys (like indexes)• Tables have one distribution key• Defines how data is split over nodes

• Tables are split in sorted regions• Each region has several slices spread across nodes• Split across several instances• Each column has its own compression type• SSD-enabled (200 GB node)• Results ….

Page 35: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
Page 36: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
Page 37: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

The Results

• The query on the previous slide (it is actually 4-5 A4 pages long)• Over 39,031,958 rows (100-150 GB)• Took 4.039s

* The data store stores 3 TB over 12 instances

Page 38: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
Page 39: The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)

Ordered-Bucket Sampling

Let’s say we want to sample 20% of events for a specific scenario.We split events into 10 buckets, depending on the hash of their “user id”.

Bucket #1 Bucket #2 Bucket #3 Bucket #4 Bucket #5 Bucket #6 Bucket #7 Bucket #8 Bucket #9 Bucket #10

Bucket #1 Bucket #2 Bucket #3 Bucket #4 Bucket #5 Bucket #6 Bucket #7 Bucket #8 Bucket #9 Bucket #10

er1bhUygQoRrPvonNRyw -(hash)> Bucket 332m9bGzQQMs7162ObeRt -(hash)> Bucket 7(…)

Then we sample only those events from Bucket #1 and Bucket #2.