Download - OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

Transcript
Page 1: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Take back control of your

Web Tracking

@ClementStenac CTO, Dataiku

Page 2: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Give me dashboards !

Page 3: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Choose one

Raw data Do what you want

Your money

Access to raw data is a premium feature

Page 4: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Who cares about raw data ?

• SAAS analytics are full-featured

• Custom variables to link with your backend data

• Did you really join all data for your future needs ?

• Do you have access / want to push to the JS all necessary data ?

• What kinds of analysis will you do later on ?

Page 5: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

A real example

Segmentation and tracking user-satisfaction

Raw tracking

data

User-level stats

User base segmentation

Metrics per segments

Tracking over time

TB

GB

Page 6: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

User-level data

Page 7: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Clustering

Page 8: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Labeling

Search for a specific Topic

Newcomer from Google

News

Foreigner Discovering The

Site

Fan who loves to comment

Home Page Wanderer

Dark Bot (Competitor?)

Here you need your business intelligence

Page 9: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Compute metrics per segment

Search for a specific Topic

Newcomer from Google

News Foreigner

Discovering The Site

Fan that loves to comment

Home Page Wanderer

Dark Bot (Competitor?)

0.3€ per session

0.23€ acquisition costs

``

`

13k sessions

1.3€ per session

0.23€ acquisition costs

938k sessions

938k sessions

0.3€ per session

0.23€ acquisition costs

738k sessions

0.83€ per session

0.73€ acquisition costs 68k sessions

0.3€ per session

1.23€ acquisition costs

1k sessions

0€ per session

0€ acquisition costs

Here you need to cross with your CRM

Page 10: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Track metrics over time

Search for a specific Topic

Newcomer from Google

News

Foreigner Discovering The

Site

Fan that loves to comment

Home Page Wanderer

Dark Bot (Competitor?)

Using your already-computed segments

Damn our latest

release has diverging

effects on segments

Page 11: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

A few other examples

• Churn prediction and explanation

• Customer lifetime value prediction

Page 12: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

OK

I WANT TO

DO IT

Page 13: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

So, I have these Apache logs

• First level of web tracking

• "Nothing required"

Page 14: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Are backend logs a solution ?

Challenge 1 : Identify a visitor

• IP ?

• NAT / Proxy

• Not everyone has a public IP address

• IP + user-agent ?

• Big companies !

Page 15: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Are backend logs a solution ?

Challenge 2 : Re-create sessions

• Using expiration times

• Advanced SQL / Hive / …

makes this easier

Page 16: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Are backend logs a solution ?

Challenge 3 : single-page webapps

• Track behaviour within each page

• Track events, not pages

Also: getting logs from IT is sometimes another challenge

Page 17: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Client-side tracking

• visitor_id and session_id handled with cookies

• Tracking page loads and various events

• Historically, "tracking" = fetching a 1x1 image

• AJAX

www.website.com

Browser

tracker.com

JS tracking code

Tracking calls

Page 18: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Are cookies good for your (web) health ?

• Each cookie belongs to a domain (and its subdomains)

• Who can write a cookie ?

– The HTTP server, who becomes owner (via the Set-Cookie HTTP header)

– JS code running on the "owner" domain

• Who can read a cookie ? – The owner HTTP server (sent by the browser) – JS code running on the "owner" domain

Page 19: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

First-party cookies

• Set by the originating server (HTTP) or JS code

• Belong to the originating domain

• Sent by HTTP to the originating domain only

• Readable by JS code

www.website.com

Browser

Cookies for www.website.com: None

tracker.com

GET / Cookies: none

Fetch tracking script

Tracking JS code: read cookies for www.website.com Tracking JS code: create visitor id and set cookie

Contents

Page 20: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

First-party cookies

• Set by the originating server (HTTP) or JS code

• Belong to the originating domain

• Sent by HTTP to the originating domain only

• Readable by JS code

www.website.com

Browser

tracker.com

GET /track?visitor_id=d37ecba Cookies: None

JS code: send AJAX request to tracker.com with visitor_id

Cookies for www.website.com: visitor_id=d37ecba

Page 21: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Third-party cookies

• Set (in HTTP) by the tracker's domain – Belong to the tracker's domain

• Not send by HTTP to the originating domain (does not belong)

• NOT readable by JS code (does not belong)

www.website.com

Browser

tracker.com

GET / Cookies: none

Fetch tracking script

Contents

Cookies for www.website.com: None

Cookies for tracker.com: None

Page 22: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

www.website.com

Browser

Cookies for www.website.com: None

tracker.com

Cookies for tracker.com: None

GET /track Cookies: None

200 OK Set-Cookie: visitor_id=33d7

Tracker code: assign visitor_id

Third-party cookies

• Set (in HTTP) by the tracker's domain – Belong to the tracker's domain

• Not send by HTTP to the originating domain (does not belong)

• NOT readable by JS code (does not belong)

Page 23: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Third-party cookies

• Set (in HTTP) by the tracker's domain – Belong to the tracker's domain

• Not send by HTTP to the originating domain (does not belong)

• NOT readable by JS code (does not belong)

www.website.com

Browser

tracker.com

Cookies for tracker.com: visitor_id=33d7

GET /track Cookies: visitor_id=33d7

200 OK

Tracker code: read visitor_id

Cookies for www.website.com: None

Page 24: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

First party cookie

• Tracks on a single website • Requires JS code for tracking • Reduced privacy impact:

No exchange of information between sites

• Usage: track your user's behaviour

Third party cookie

• Tracks across all websites using the same tracker

• More frowned upon

• Usage: generally, ads but also multi-website

Why each ?

Rarely blocked (used for logins)

Blocked by up to 40% visitors

Page 25: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

What are your obligations ?

With ALL cookies

• You should ask user whether he wants cookies

• Even non-tracking related cookies

• Yes, even login-related ones

Page 26: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

What are your obligations ?

With third party cookies

• Obey the Do-Not-Track header

www.website.com

Browser

tracker.com

GET /track Cookies: None DNT: 1

200 OK

Tracker code: DO NOTHING

Page 27: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

What are your obligations ?

With third party cookies

• Provide an opt-out URL

• Allows the user to /optin , /optout or /status

See in action : www.youronlinechoices.com

Page 28: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

What are your obligations ?

With third party cookies

• Provide a P3P policy

• Else, older IE blocks you

"What are you doing with my data ?"

Looks like this:

CP="IDC DSP COR ADM DEVi TAIi PSA PSD IVAi IVDi CONi HIS OUR IND CNT"

Page 29: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Tracking in mobile apps

• Preserve battery

– Each network call is costly

– Do not track everything synchronously

• Network access is intermittent

– Queue events and wait for network access

Page 30: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

So, what are my choices ?

• You might really want to be your own web tracker

• Most used open source Webtracker : Piwik

• Provides both raw data and nice dashboards – MySQL backend

– Raw data via API

– Slightly less suited for analytics

Page 31: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

WT1

YOUR OWN

TRACKER

IN MINUTES

Page 32: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

WT1

An open source (Apache License) server to build your own web tracking

https://github.com/dataiku/wt1

• Designed to provide you with raw data, directly usable for analytics

• Very high performance and scalability

Page 33: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Features

• 1st or 3rd party cookies – Handling of DNT and opt-out

– Helps handling P3P

• Track events or pages with key-value data

• Visitor-scope and session-scope variables

• "Live view" debugging console

Page 34: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Features

• Dashboards: None

• Events processing and storage – Filesystem, S3

– Event queues: Flume

– Custom processors

• JSON API for custom tracking

• iOS library

Page 35: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Architecture

Client-side JS tracker

iOS library

• 1st or 3rd party cookies

• Event-level tracking

• Automatic batching • Queuing to deal with

network interruptions

WT1 Server

Raw storage • Filesystem • S3

Event processors: • Real-time aggregations • Custom code

Event queues • Flume • Kafka, RabbitMQ, …

• Java • > 20K events / second • Handles DNT, P3P, opt-out, …

JSON POST

Page 36: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Future work

• Android library

• More event queues supported OOTB

– Kafka

– RabbitMQ

• Avro storage

Page 37: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself

www.dataiku.com

Thank you !

Clément Stenac [email protected] @ClementStenac

www. .com