The evolution of advertising technology and the importance of personalization

81

Transcript of The evolution of advertising technology and the importance of personalization

Page 1: The evolution of advertising technology and the importance of personalization
Page 2: The evolution of advertising technology and the importance of personalization

EXPECTATIONS

Page 3: The evolution of advertising technology and the importance of personalization

AGENDA§ Introduction

§ Data Collection

§ Data Consumption

§ Data Analysis

§ Exposing Data

§ Q & A

Page 4: The evolution of advertising technology and the importance of personalization

DharmicData, Data Center of Excellence

Data StrategyData Management

Platform Data-Driven Solutions KPI’s and experimentation

Transforming the whole Value Chain

http://www.dharmicdata.com@dharmicdata@fsroque @moshtan

Page 5: The evolution of advertising technology and the importance of personalization
Page 6: The evolution of advertising technology and the importance of personalization

Data is everywhere

(BIG) DATA

Contains informationExtracting information allows us to act proactively

Overcome problemsOptimize returns

Page 7: The evolution of advertising technology and the importance of personalization

the more data we can collect and use, the more information we’ll generate, the better we’ll operate

(BIG) DATA

Page 8: The evolution of advertising technology and the importance of personalization

Varying perceptions of “BIG DATA”

Page 9: The evolution of advertising technology and the importance of personalization

BIG DATA“It’s about being smarter with your data?”

“It means making faster decisions?”

“It simply means more data?”

“It’s about cheaper storage technology?”

“It’s all about social media?”

Page 10: The evolution of advertising technology and the importance of personalization

What is (Big) Data?

Volume Velocity

VarietyValue

Big Data

Data in motionEnabling real-time decisions

Data in many formsStructured, unstructured, text, multimedia

Data in numbersExtracting businessinsights and revenue from data

Data at scaleTerabytes to petabytes of data(or when your work processes dictate)

Page 11: The evolution of advertising technology and the importance of personalization

Trends driving the importance of Big Data

Customer-centric outcomes

49%

Operational optimization

18%

Risk/Financial management

15%

New business model14%

Employee Collaboration

4%

Businesses' Big Data objectives

“Analytics, the real world use case of Big Data”. IBM Institute of Business Value Study, October 2012

• Everything is digitized• Advanced analytics technologies • Customer-centricity ‘smarter’ solutions

Page 12: The evolution of advertising technology and the importance of personalization

Data is everywhere, and should be accessible

PIPELINES

SENSORSSOCIAL INTERACTIONS

BEHAVIOR

CONSUMPTION

SOLUTION’SQUALITY

CAPTUREDATA $$$

Page 13: The evolution of advertising technology and the importance of personalization

Capturing most Data

PIPELINES

Turn it to valuable information

A pipeline ties in several Data processing steps together

Page 14: The evolution of advertising technology and the importance of personalization

ISSUESDEALING WITH DATA

From batches to pipelines

Page 15: The evolution of advertising technology and the importance of personalization

#1TRANSPORTING DATA BETWEEN

SYSTEMS

Page 16: The evolution of advertising technology and the importance of personalization

DATA INTEGRATION (~ETL)

RELATIONAL DATABASES

HADOOP

SEARCH AND INDEXINGMONITORING

KEY-VALUE STORES

Page 17: The evolution of advertising technology and the importance of personalization

http://www.confluent.io/blog/stream-data-platform-1/

THE SPAGHETTI MONSTER

Page 18: The evolution of advertising technology and the importance of personalization

#2NEED FOR RICH ANALYTICAL DATA

PROCESSING

Page 19: The evolution of advertising technology and the importance of personalization

VERY LOW LATENCY DATA PROCESSING

STREAM PROCESSING

REAL-TIME ANALYTICS

DATA CLOSE TO PROCESSING

Page 20: The evolution of advertising technology and the importance of personalization

MORE ISSUES• Lossy and high-latency connections

• Segmented (siloed) data sources

• Batched database migrations, data insertions, etc.

• Unscalable, tightly connected systems

• ‘Duct taped connections’

• Unreliable data – leading to a lot of QA

• No room for data processing outside of batch, data archival or ad hoc processing

Page 21: The evolution of advertising technology and the importance of personalization

http://www.confluent.io/blog/stream-data-platform-1/

STREAM DATA PLATFORM TO THE RESCUE

Page 22: The evolution of advertising technology and the importance of personalization

UNIVERSAL DATA PIPELINE

CONTINUOUS FEEDS OF WELL-FORMED DATA

Page 23: The evolution of advertising technology and the importance of personalization

http://www.confluent.io/blog/stream-data-platform-1/

THE STREAM DATA PLATFORM

Page 24: The evolution of advertising technology and the importance of personalization

REAL-TIME STREAM PROCESSING

STREAM

User profile

Enrich user profile

Store in db

Predict user behavior

Target user

Page 25: The evolution of advertising technology and the importance of personalization

WHAT DOES A STREAM DATA PLATFORM NEED TO DO?

FAST?HIGH THROUGHPUT?

SCALE WELL?

Page 26: The evolution of advertising technology and the importance of personalization

KEY REQUIREMENTS FOR A STREAM DATA PLATFORM

• Reliable, no data loss

• High Throughput to handle large event data

• Persist data for longer periods, for enabling batch based workflows

• Low latency data for real-time applications

• Central system

• Close integration with stream processing systems

Page 27: The evolution of advertising technology and the importance of personalization

STREAM DATA PLATFORM RELATED TO EXISTING THINGS

Page 28: The evolution of advertising technology and the importance of personalization

STREAM DATA PLATFORMENTERPRISE MESSAGING SYSTEM

One-off deployment Central data hub

Limited storage capacity Large log history

Page 29: The evolution of advertising technology and the importance of personalization

STREAM DATA PLATFORMDATA INTEGRATION TOOLS

Disparate tools and deployments True platform

Many routine data-cleanup steps Stream abstracting and data locality make it easier to tap into and build applications around a stream

Page 30: The evolution of advertising technology and the importance of personalization

STREAM DATA PLATFORMENTERPRISE SERVICE BUSES

Transformation logic is embedded Data transformation is decoupled from the stream

Processing tasks need to agree with multiple stakeholders

Individual teams can use and reuse streams, no bottleneck

Page 31: The evolution of advertising technology and the importance of personalization

STREAM DATA PLATFORMDATA WAREHOUSES AND HADOOP

Quickly flow dataPublish resultsLong term storage

Page 32: The evolution of advertising technology and the importance of personalization

A BIG IDEAThe democratization of “data”

=> Making data available through more of the organization

The democratization of the “cluster” => Making data+resources available through more of the

organization

Page 33: The evolution of advertising technology and the importance of personalization

In the Hypervisor world, low utilization has been widely observed

A McKinsey study in 2008 pegged data-center utilization at roughly 6 percent.

A Gartner report from 2012 put industry wide utilization rate at 12 percent.

An Accenture paper, from 2011, sampling Amazon EC2 machines found 7 percent utilization over the course of a week.

Page 34: The evolution of advertising technology and the importance of personalization

The business case for data Warehouse Scale Computing

Page 35: The evolution of advertising technology and the importance of personalization

Arguments for WSCRather than running several specialized clusters, each at relatively low utilization rates, instead run many mixed workloads obvious benefits are realized in terms of:

• scalability, elasticity, fault tolerance, performance, utilization • reduced equipment capex, Ops overhead, etc.• reduced licensing, eliminating need for VMs or potential vendor lock-in • reduced time for engineers to ramp up new services at scale • reduced latency between batch and services, enabling new high ROI use cases• enables Dev/Test apps to run safely on a Production cluster• Eases deployment

Page 36: The evolution of advertising technology and the importance of personalization

Prior Practice• Low utilization rates• Longer time to ramp up new services• Even more machines to manage• Substantial performance decrease• VM licensing costs and specific data center vendor economics

• Even more machines to manage• Substantial performance decrease• VM licensing costs and specific data center vendor economics• Failures make static partitioning more complex to manage

Page 37: The evolution of advertising technology and the importance of personalization

Current Practice: WSC

“We wanted people to be able to program for the datacenter just like they program for their laptop.”- Ben Hindman, Co-creator of Mesos

Page 38: The evolution of advertising technology and the importance of personalization

WAREHOUSE SCALE COMPUTING

Sever management and granular resource allocationExternal Scalability, Horizontal Scalability, Health checks, Monitoring, and Scheduling

MESOS MARATHONCHRONOS

Page 39: The evolution of advertising technology and the importance of personalization

BREAK

Page 40: The evolution of advertising technology and the importance of personalization
Page 41: The evolution of advertising technology and the importance of personalization

WHERE TO FIND DATA?

Step 1. Ingest DataStep 0. Find Data

Page 42: The evolution of advertising technology and the importance of personalization

GETTING DATA

REQUEST/RESPONSE

STREAMING

Page 43: The evolution of advertising technology and the importance of personalization

REQUEST/RESPONSEClassic, the client just asks the third party

Issue a request, return a response

- What if the service returns a lot of data?

- What if the service generates data very fast?

- What if the data points will only be sent once?

Page 44: The evolution of advertising technology and the importance of personalization

STREAMINGA Permanent connection is made between the service and the consumer

The data flows continuously through the pipe. The consumer subscribes to the service

What if the incoming data rate is too high?

Page 45: The evolution of advertising technology and the importance of personalization

MICROSERVICES ARCHITECTUREThin collection layer:

• Pass the data to the next layer (the queue)

• Scalable vertically (increase req. rate)

• Scalable horizontally (support fast data)

• Extendable (capture new sources and types of data)

Page 46: The evolution of advertising technology and the importance of personalization

DMPs and Data CollectionWhat is a data management platform?

In simple terms, a data management platform is a data warehouse. It’s a piece of software that sucks up, sorts and houses information, and spits it out in a way that’s useful for marketers, publishers and other businesses.

Page 47: The evolution of advertising technology and the importance of personalization

Data Collection at DD

Page 48: The evolution of advertising technology and the importance of personalization

EVENTSOrders, Sales, Clicks

Sensor Data

Databases?

Page 49: The evolution of advertising technology and the importance of personalization

SIGNALS – CAPTURING USER BEHAVIOR

Page 50: The evolution of advertising technology and the importance of personalization

“BigData,thefutureoflogistics”.Luxembourg-Poland BusinessClub, KPMG

Page 51: The evolution of advertising technology and the importance of personalization
Page 52: The evolution of advertising technology and the importance of personalization

QUEUESP

S

P

P

P

S

S

S

PUB/SUB

“… put them (messages) on a software bus where all processes can see them”

- Gartner

Page 53: The evolution of advertising technology and the importance of personalization

A COMMIT LOG

0 1 2 3 4 5 6 7 8 9

1st Record Next Record Written

A log, is perhaps the simplest possible storage abstraction.It is an append only, totally-ordered sequence of records,

ordered by time.

Time

Page 54: The evolution of advertising technology and the importance of personalization

A DISTRIBUTED COMMIT LOG

0 1 2 3 4 5 6 7 8 9Partition 0

Writes0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

Partition 1

Partition 2

Old New

Is partitioned and replicated across multiple nodes.

10

10

10

11

11

11

12

12

scalableTime (duration configurable)

Page 55: The evolution of advertising technology and the importance of personalization

http://www.confluent.io/blog/stream-data-platform-1/

Page 56: The evolution of advertising technology and the importance of personalization

SPARK STREAMING

1. receive

3. output

2. process

streaming data from data sources

the data

the results out to downstream

Page 57: The evolution of advertising technology and the importance of personalization

Architecture of SPARK Streaming: Discretized Streams

batches (RDDs)

Receiver

Records are processed in batches and each batch a RDD, a partitioned dataset.

Page 58: The evolution of advertising technology and the importance of personalization

Event-driven ApplicationsStream processing

Real time Event data

Streaming data platforms

Page 59: The evolution of advertising technology and the importance of personalization

Event-data Can Be Thought of as Event-Streams

Evolution of Traditional ETL

Production DB Standby DB

Periodic Full backups

Production DB Standby DB

Frequent diffs

amount of data amount of data

Production DB Standby DB

Even more frequent diffing

What we are left with is a continuous sequence of single row changes amount of data

Page 60: The evolution of advertising technology and the importance of personalization

100s of users 1,000s users 10,000s of users 100,000s of users

Scalable micro-services

Micro-service

Micro-service

Micro-service

Micro-service

Micro-service

Micro-service

Micro-service

Micro-service

Micro-service

Micro-service

Micro-service

Micro-service

Micro-service

Micro-service

Micro-service

Micro-service

Page 61: The evolution of advertising technology and the importance of personalization

WORKSHOP• Split into groups of 3-4

• Discuss the question:

• “How can Big Data, pipelining, streaming, and real-time computing apply to my current or future

assignments?”

• One lucky group will be randomly selected for presenting :)

Page 62: The evolution of advertising technology and the importance of personalization

BREAK

Page 63: The evolution of advertising technology and the importance of personalization
Page 64: The evolution of advertising technology and the importance of personalization

THE EXPLORATORY PHASE

Page 65: The evolution of advertising technology and the importance of personalization

EXPLORATORY PHASEUnavoidable

Understand the data you are working with

Computationally Expensive

Lot’s of retries, the model chosen down the line will involve trial and error

Page 66: The evolution of advertising technology and the importance of personalization

PIPELINING (UNIFIED DATA SILOS)

Page 67: The evolution of advertising technology and the importance of personalization

Productizing Data Science

MODELING DEPLOYINGCODING

Finding dataParsing structures

CleaningReducingLearning

Predicting

Connect to prod dataTuning training parameters

Create prediction serviceGenerate Deployable model

Connect to Prod infrastrIntegration with existing ENV

Allocate Schedule resourcesEnsure availability

Page 68: The evolution of advertising technology and the importance of personalization

Extended Pipeline

COLLECT CODE

COLLECTION TIERQUEING TIER

QUEING TIERIN MEMORY TIERCOMPUTING TIER

MODEL CODE DEPLOY CREATESERVICES

INTEGRATEAPPLICATION

COLLECTION TIERQUEING TIER

QUEING TIERIN MEMORY TIERCOMPUTING TIER

IN MEMORY TIERCOMPUTING TIER

RESOURCE MANAGER TIER

RESOURCE MANAGER TIERSERVICE TIER

CREATING SERVICESAbstracts access to prepared views Exposes Prediction capabilities Highly horizontally scalable Scaling micro services cluster→ cheaper than computing cluster

“Extra” Coding phase

Page 69: The evolution of advertising technology and the importance of personalization

EXPLORATORY PHASE – NEED FOR SPEEDWe can’t afford loosing time due to inefficient toolset

Interactivity and reactivity to find the optimal result and move forward

NOTEBOOKREPL evolution

DASHBOARD

Page 70: The evolution of advertising technology and the importance of personalization

SPARK NOTEBOOK

http://spark-notebook.io/

Spark + Scala

Exploration of Big Data

Page 71: The evolution of advertising technology and the importance of personalization

NOTEBOOK DEMO

Page 72: The evolution of advertising technology and the importance of personalization

DASHBOARD

http://redash.io/

Connect to any DB

(Custom) hdfs integration via Drill

Interactive

SQL-like querying

Page 73: The evolution of advertising technology and the importance of personalization

DASHBOARD DEMO

Page 74: The evolution of advertising technology and the importance of personalization
Page 75: The evolution of advertising technology and the importance of personalization
Page 76: The evolution of advertising technology and the importance of personalization

Exposing Views on data

The data science pipeline now has to include the way for results to be consumed by third parties

(service oriented architecture)

What are the results?

Intermediate results and the model need to be exposed

Having services for views (APIs) allows us to abstract the way they are created.

Page 77: The evolution of advertising technology and the importance of personalization

APIs Expose a stream

Expose intermediate results

Expose models

Page 78: The evolution of advertising technology and the importance of personalization

APIs

STREAM

Events Current events/sec Increment counter

APITotal number of events

Average events/sec

# occurrences of specific event

Event details

Page 79: The evolution of advertising technology and the importance of personalization

EVENT-DRIVEN APPLICATIONS

Monitor and respond. Real-time

Page 80: The evolution of advertising technology and the importance of personalization
Page 81: The evolution of advertising technology and the importance of personalization