Building a relevance platform with Couchbase and Elasticsearch
-
Upload
jeroen-reijn -
Category
Technology
-
view
112 -
download
1
description
Transcript of Building a relevance platform with Couchbase and Elasticsearch
OneHippo @ Goto
follow the Hippo trail
Building a relevance platform with Couchbase
and Elasticsearch@jreijn | Hippo
#gotoams, June 18
follow the Hippo trail
OneHippo @ Goto
About me
• Architect @ Hippo
• DevOps guy
• Blogger @ http://blog.jeroenreijn.com
follow the Hippo trail
OneHippo @ Goto
About Hippo
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
Relevance?
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
“The capability of a search engine or function to
retrieve data appropriate to a user's needs.”
http://www.thefreedictionary.com/relevance
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
How we deliver relevant content
@Hippo
follow the Hippo trail
OneHippo @ Goto
Registration
Visitor - entity making HTTP requests
Collector - records data about a visitor or his behavior
Example: location collector (GeoIPCollector)
Targeting Data - all data about a specific visitor
Example: IP address is located in Amsterdam
follow the Hippo trail
OneHippo @ Goto
MatchingCharacteristic - a type of fact about visitors
Example: "comes from a city", "experiences a type of weather"
Target Group - the specification of a Characteristic
Example: "comes from a European city", "comes from Amsterdam"
Persona - one or more target groups that describe a certain type of visitor
Example: "Jim, the European urban consumer",
"Alice, the Pet owner"
follow the Hippo trail
OneHippo @ Goto
What do we store?Request log
Targeting data
Statistics
Averages, e.g. how many visitors became which persona
follow the Hippo trail
OneHippo @ Goto
Real-time analysis
follow the Hippo trail
OneHippo @ Goto
OneHippo @ GotoArchitecture
follow the Hippo trail
OneHippo @ Goto
RDBMS
Hippo Delivery Tier
Hippo Repository
App server
XMLJSON (X)HTML
follow the Hippo trail
OneHippo @ Goto
Delivery Tier
URL Matching
Fetch content
Compose output
Request
Response
Request
follow the Hippo trail
OneHippo @ Goto
Delivery Tier
URL Matching
Targeting Data Collection
Compose output
Request
Response
Request
Fetch content
Scoring
follow the Hippo trail
OneHippo @ Goto
OneHippo @ GotoScaling
follow the Hippo trail
OneHippo @ Goto
RDBMS
Hippo Delivery Tier
Hippo Repository
App server
Hippo Delivery Tier
Hippo Repository
App server
Scaling out
follow the Hippo trail
OneHippo @ Goto
RDBMS
Delivery Tier
Repository
App server
Delivery Tier
Repository
App server
Scaling out
TargetingDatastore
follow the Hippo trail
OneHippo @ Goto
OneHippo @ GotoWhat kind of ‘storage’?
follow the Hippo trail
OneHippo @ Goto
Distributed Cache?
follow the Hippo trail
OneHippo @ Goto
We have a winner!
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
Requirements change!
follow the Hippo trail
OneHippo @ Goto
OneHippo @ GotoNoSQL to the rescue
follow the Hippo trail
OneHippo @ Goto
Suitable types• Key-value store
• Document database
follow the Hippo trail
OneHippo @ Goto
Assessment Criteria
Maturity Data model
Consistency model
PerformanceReplication
Caching model Query model
Monitoring
Scalability
Reliability
Support
follow the Hippo trail
OneHippo @ Goto
Selection Criteria• Performance!
• Scalability
• Schema flexibility
• Simplicity
• Monitoring
• Support
follow the Hippo trail
OneHippo @ Goto
OneHippo @ GotoPerformance !!
follow the Hippo trail
OneHippo @ Goto
OneHippo @ GotoScalability
follow the Hippo trail
OneHippo @ Goto
OneHippo @ GotoSchema flexibility
follow the Hippo trail
OneHippo @ Goto
{ "visitorId": "7a1c7e75-8539-40", "pageUrl": "http://localhost:8080/site/news", "pathInfo": "/news", "remoteAddr": "127.0.0.1", "referer": "http://localhost:8080/site/", "timestamp": 1371419505909, "collectorData": { "geo": { "country": "", "city": "", "latitude": 0, "longitude": 0 }, "returningvisitor": false, "channel": "English Website" }, "personaIdScores": [], "globalPersonaIdScores": []}
Request log document
follow the Hippo trail
OneHippo @ Goto
{ "geo": { "collectorId": "geo", "city": "", "country": "", "latitude": 0, "longitude": 0 }, "channel": { "collectorId": "channel", "channels": [ "English Website" ], "lastVisitedChannel": "English Website" }}
Visitor document
follow the Hippo trail
OneHippo @ Goto
OneHippo @ GotoSimplicity
follow the Hippo trail
OneHippo @ Goto
OneHippo @ GotoMonitoring
follow the Hippo trail
OneHippo @ Goto
OneHippo @ GotoSupport
follow the Hippo trail
OneHippo @ Goto
OneHippo @ GotoCouchbase
follow the Hippo trail
OneHippo @ Goto
Why Couchbase?
• Drop-in replacement for memcached
• Read/Write-through cache
• High throughput
• Easy scalability
• Schema flexibility
• Low latency
follow the Hippo trail
OneHippo @ Goto
Couchbase
• Open Source
• Document-oriented
• Easy Scalable
• Consistent High Performance
follow the Hippo trail
OneHippo @ Goto
Performance
• Object managed cache
• Write Queue to disk
• Avoids Cold Cache
follow the Hippo trail
OneHippo @ Goto
Easy scalable
• Auto sharding
• Cross cluster replication (XDCR)
• Master - Master replication
follow the Hippo trail
OneHippo @ Goto
Flexible data model
• Native JSON support
• Incremental Map Reduce
• Gives power to the developer
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
How we run Couchbase @Hippo
follow the Hippo trail
OneHippo @ Goto
Load Balancer
Database cluster
Hippo Delivery Tier Couchbase cluster
•Request log data•Targeting data•Statistics data
follow the Hippo trail
OneHippo @ Goto
Query capabilities• Querying via views
• Secondary indexes via views
• Views based on Map - Reduce
• Lacks some advanced query capabilities
follow the Hippo trail
OneHippo @ Goto
Elasticsearch
• Apache Lucene
• Designed to be distributed
• Schema free
• Apache 2 licensed
• RESTful API
follow the Hippo trail
OneHippo @ Goto
Added value of ES• Full text search
• Faceted search
• Geo spatial search
• All in (near) real-time
follow the Hippo trail
OneHippo @ Goto
Couchbase Server Cluster Elasticsearch Server Cluster
Hippo Delivery Tier
Java API
Wri
te
Rea
d
XDCR Couchbase ES Transport plugin
Replicating to ES
follow the Hippo trail
OneHippo @ Goto
OneHippo @ GotoDemo time!
follow the Hippo trail
OneHippo @ Goto
OneHippo @ GotoWhat’s Next?
follow the Hippo trail
OneHippo @ Goto
Advanced analytics
follow the Hippo trail
OneHippo @ Goto
OneHippo @ Goto
Thank you!
Questions?
[email protected]@jreijn
ps. We’re hiring!