Realtime visitor analysis with Couchbase and Elasticsearch · Realtime visitor analysis with...

Post on 02-Jul-2018

243 views 0 download

Transcript of Realtime visitor analysis with Couchbase and Elasticsearch · Realtime visitor analysis with...

Realtime visitor analysis with Couchbase and Elasticsearch

Jeroen Reijn | @jreijn | #nosql13

follow the Hippo trail

follow the Hippo trail

NoSQL Matters 2013

About me

Jeroen Reijn

Software engineer

Hippo

@jreijn

http://blog.jeroenreijn.com

follow the Hippo trail

NoSQL Matters 2013

About Hippo

follow the Hippo trail

NoSQL Matters 2013

OneHippo @ Goto

Visitor Analysis

follow the Hippo trail

NoSQL Matters 2013

OneHippo @ Goto

follow the Hippo trail

NoSQL Matters 2013

OneHippo @ Goto

follow the Hippo trail

NoSQL Matters 2013

Journey based Targeting

follow the Hippo trail

NoSQL Matters 2013

OneHippo @ Goto

How we analyse visitors @ Hippo

follow the Hippo trail

NoSQL Matters 2013

Registration

Visitor - entity making HTTP requests Collector - records data about a visitor or his behaviour

Example: location collector (GeoIPCollector) Targeting Data - all data about a specific visitor

Example: IP address is located in Amsterdam

follow the Hippo trail

NoSQL Matters 2013

MatchingCharacteristic - a type of fact about visitors

Example: "comes from a city", "experiences a type of weather"

Target Group - the specification of a Characteristic Example: "comes from a European city", "comes from Amsterdam"

Persona - one or more target groups that describe a certain type of visitor

Example: "Jim, the European urban consumer", "Alice, the Pet owner"

follow the Hippo trail

NoSQL Matters 2013

What do we store?Request log

!

Targeting data

!

Statistics

Averages, e.g. how many visitors became which persona

follow the Hippo trail

NoSQL Matters 2013

Real-time analysis

follow the Hippo trail

NoSQL Matters 2013

How about YOU?

• Do you analyse your visitors?

• Do you do it ‘real-time’?

follow the Hippo trail

NoSQL Matters 2013

OneHippo @ GotoArchitecture

follow the Hippo trail

NoSQL Matters 2013

RDBMS

Hippo Delivery Tier

Hippo Repository

App server

XMLJSON (X)HTML

follow the Hippo trail

NoSQL Matters 2013

Delivery Tier

URL Matching

Fetch content

Compose output

Request

Response

follow the Hippo trail

NoSQL Matters 2013

Delivery Tier

URL Matching

Collect data

Compose output

Request

Response

Fetch content

Scoring

follow the Hippo trail

NoSQL Matters 2013

OneHippo @ GotoScaling

follow the Hippo trail

NoSQL Matters 2013

RDBMS

Hippo Delivery Tier

Hippo Repository

App server

Hippo Delivery Tier

Hippo Repository

App server

Scaling out

follow the Hippo trail

NoSQL Matters 2013

RDBMS

Delivery Tier

Repository

App server

Delivery Tier

Repository

App server

Scaling out

Targeting Datastore

follow the Hippo trail

NoSQL Matters 2013

OneHippo @ GotoWhat kind of storage?

follow the Hippo trail

NoSQL Matters 2013

OneHippo @ Goto

Writer

Single write

Datastore

Several reads

Typical Data Access Pattern

follow the Hippo trail

NoSQL Matters 2013

Analytics Data Access Pattern

Writers

Datastore

Single read

Several writes

CMS user

follow the Hippo trail

NoSQL Matters 2013

Targeting Data Access Pattern

Visitors

Datastore

Single read

Several writes

Several reads

CMS user

follow the Hippo trail

NoSQL Matters 2013

Distributed Cache

follow the Hippo trail

NoSQL Matters 2013

OneHippo @ Goto

Requirements change!

follow the Hippo trail

NoSQL Matters 2013

OneHippo @ GotoNoSQL ?

follow the Hippo trail

NoSQL Matters 2013

Suitable types• Key-value store

• Document database

• Column oriented store

follow the Hippo trail

NoSQL Matters 2013

Assessment Criteria

Maturity Data model

Consistency model

PerformanceReplication

Caching model Query model

Monitoring

Scalability

Reliability

Support

follow the Hippo trail

NoSQL Matters 2013

Selection Criteria• Performance

• Scalability

• Schema flexibility

• Simplicity

follow the Hippo trail

NoSQL Matters 2013

OneHippo @ GotoCouchbase

follow the Hippo trail

NoSQL Matters 2013

Why Couchbase?

• Drop-in replacement for memcached

• Read/Write-through cache

• High throughput

• Easily scalable

• Schema flexibility

• Low latency

follow the Hippo trail

NoSQL Matters 2013

Couchbase

• Open Source

• Document-oriented

• Easy Scalable

• Consistent High Performance

• Apache licensed

follow the Hippo trail

NoSQL Matters 2013

Performance

• Object managed cache

• Write Queue to disk

follow the Hippo trail

NoSQL Matters 2013

Easy scalable

• Auto sharding

• Cross cluster replication (XDCR)

• Master - Master replication

follow the Hippo trail

NoSQL Matters 2013

Flexible data model

• Native JSON support

• Incremental Map Reduce

• Gives power to the developer

follow the Hippo trail

NoSQL Matters 2013

OneHippo @ Goto

How we run Couchbase @ Hippo

follow the Hippo trail

NoSQL Matters 2013

Load Balancer

Database cluster

Hippo Delivery Tier Couchbase cluster

•Request log data •Targeting data •Statistics data

follow the Hippo trail

NoSQL Matters 2013

Analysis capabilities• Querying via views

• Secondary indexes via views

• Views based on Map - Reduce

• Limited ad-hoc query capabilities

follow the Hippo trail

NoSQL Matters 2013

Elasticsearch

• Apache Lucene

• Designed to be distributed

• Schema free

• Apache license

• RESTful API

follow the Hippo trail

NoSQL Matters 2013

Added value• Unstructured search

• Structured search

• Faceted search

• Geo spatial search

• Combinate all

• All in (near) real-time

follow the Hippo trail

NoSQL Matters 2013

Couchbase Server Cluster Elasticsearch Server Cluster

Hippo Delivery Tier

Java API

Wri

te

Rea

d

Couchbase Transport plugin

Replication

XDCR

Read / Query

follow the Hippo trail

NoSQL Matters 2013

OneHippo @ GotoWhat’s Next?

follow the Hippo trail

NoSQL Matters 2013

Advanced analytics

follow the Hippo trail

NoSQL Matters 2013

OneHippo @ Goto{ Demo }

follow the Hippo trail

NoSQL Matters 2013

OneHippo @ Goto

!

Thanks! !

j.reijn@onehippo.com @jreijn