Using Couchbase and Elasticsearch as data layers

download Using Couchbase and Elasticsearch as data layers

If you can't read please download the document

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of Using Couchbase and Elasticsearch as data layers

PowerPoint Presentation

Sizmek MDX NXTUsing Couchbase and Elasticsearch as data layer

Tal Maayani Yuval Perry

For more details


High level mission

Building a scalable , highly available , large and fast ad management platform

Main Technologies used AWS Deployment

Micro services architecture






NoSQL is the most critical notion in the architecture4

Data later alternatives CouchbaseMongoDBConsistency concept (CAP)Eventually consistent (CP)Using XDCR : APEventually consistent (CP)

Query languageQuery API + N1QLQuery API (No DSL)Storage handlingAppend onlyIn place update / replace


Why NoSQL? NoSQL Document DatabaseRelational Database Unstructured data Structured data Memory first approachDisk first approach No transactionsTransactionalScale Horizontally Scale vertically

No SQL DB allows Fast read and writesHold variety of data modelsLarge data volumesCloud friendly deploymentNo single point of failure

Still need to take care of transactionless, eventually consistent data source

RDBMS optimization required DBA teamNoSQL is like space ship vs RDBMS can be like rolls roise


Why Couchbase High LevelEasy ScalabilityConsistent, High PerformanceAlways On 24x7x365Grow cluster without application changes, without downtime with a single clickConsistent sub-millisecond read and write response times with consistent high throughputNo downtime for software upgrades, hardware maintenance, etc.

Add N1QL 7

Why Couchbase ?

JSON support

Indexing and QueryingCross data center replication

Incremental Map Reduce

Internal tool for couchbase DB querie8

Our data layer

Generic Data Access Layer


Save / UpdateXDCR


N1QL fetch documents by Ids9

Demo Sizmek couchbase administrator toolIn house development tool that allows to perform ES queries as well as N1QL queries

UsageData investigation Data migration

How we maintain Atomicity on transaction less data sourceTransaction manager service maintain flow state between multiple entities change Provide atomicity & tracking Example: Save smart version ad flow

Dynamic Campaign OptimizationTransaction ManagerAsset Mgmt Ad ServiceCreate AdUpload ad assetsCreate Smart version

Assets createdAd CreatedSmart version created

Do not ensure isolation (ACID)

Add title12

Elastic search consistency problemand How to overcome this in automatic testingThe problemIn a clustered elastic search environment, one document update is not automatically reflected in all notes. This caused an inconsistent results in automatic testing.ExampleChange campaign name from A to B. Automatic test verifies that the change actually tool place by getting the entity and verify its name. Possible SolutionsWait few seconds before checking for updated status Use elastic search refresh to force in memory index update

A refresh effectively calls a reopen on the lucene index reader, so that the point in time snapshot of the data that you can search on gets updated. This lucene feature is part of the lucene near real-time api. Anelasticsearch refreshmakes your documents available for search, but it doesn't make sure that they are written to disk to a persistent storage, as it doesn't call fsync, thus doesn't guarantee durability. What makes your data durable is a lucene commit (flush), which is way more expensive.Anelasticsearch flusheffectively triggers a lucene commit, and empties also the transaction log, since once data is committed on the lucene level, durability can be guaranteed by lucene itself. Flush is exposed as an api too and can be tweaked, although usually that is not necessary. Flush happens automatically depending on how many operations get added to the transaction log, how big they are, and when the last flush happened.


Name uniqueness implementationHow to implement unique constrains using couchbaseProblemMaintain unique entity name Real use caseKeep advertisement name unique system widePossible Solution

Save uniqueness document Key: entity nameValue : entity idSave succeeded?Save entityReturn errorDelete uniqueness docInput: entity to save Still need to take care of orphan uniqueness documents

Query example Top 10 selling productsSELECT, SUM(items.count) AS unitsSold FROM purchases UNNEST purchases.lineItems AS items JOIN product ON KEYS items.product GROUP BY product ORDER BY unitsSold DESC LIMIT 10

Products: {"product": { "id": 1 "name":"iphone6S" }}

Purchases:{ "purchase": { "lineItems":[ {"product": 1}, {"product": 2} ] }}

N1QL ExampleUse Query Workbench Developer PreviewExample queriesselect mvbucket.`key` from mvbucket where payload._type = 'AdSmartVersion' and payload.createdOn is not missing select * from mvbucket where payload._type = 'AdSmartVersion' and payload.masterAdId = 1073741825 and payload.createdOn between 1349057369158 and 1449057369158 select payload.masterAdId, count(1) from mvbucket where payload._type = 'AdSmartVersion' and payload.createdOn between 1349057369158 and 1449057369158 select payload.masterAdId, count(1) as count from mvbucket where payload._type = 'AdSmartVersion'

Couchbase Java client 2Notes on JAVA client

Built in support of JSON documents Support countersAsynchronous client using java RXAllow exploit already used reactive business logic Parallel efficient processing Inherent error handling for example retries get document with an exponential backoff

Observable .from(docIds) .flatMap(id -> { return bucket .async() .get(id) .retryWhen(RetryBuilder .anyOf(BackpressureException.class) .delay(Delay.exponential(TimeUnit.MILLISECONDS, 100)) .max(10) .build() ); }) .subscribe();

The following code retries with an exponential backoff (with a 100 millisecond ceiling), but stops after 10 attempts and propagates the error.Observable. For more info see

Example get bulk documentspublic List findByIds(Iterable keys) { return Observable .from(keys) .flatMap(id -> bucket.async().get(id, RawJsonDocument.class) ) .map(this::buildCouchbaseStorable) .toList()

Our use of Elasticsearchquery engine

Free text search user boolean queries Data filtering data grid filtering Grouping data grid groupingAuthorization filter document according to user permissionsbatch processing internal services that use scan and scroll to operate on large data set

Elastic search usually used for free text and auto complete19

Elastic search some Best practices

Carefully maintain index schema Avoid using Dynamic mappingData type collisionsLarge data set do not save data that is not usedBuild static schema from data model Updating data model searchable field trigger build of new indexSome changes in schema required re-indexing, e.g. adding mandatory field, change of enumeration valueInconsistency updated data is not immediately appears on query resultSystem overall design must be aware of this limitationThrottling must control number of writes

Show rebuild of static index with high availability20

Couchbase 4.1 OUR usage

Use optimistic locking - update operations are done through updater lambda function

N1QL Do not meet performance for large data set with order by queriesTook more than 5 sec to query 250 entities Used for business logic where no sorting is required Used when consistency is importantXDCR Customize plugin to index required entities Add support of parent child relationship in elasticsearch

Using Reactive Business Logic What IS it and Why use it?Responsive - Thesystem responds in a timely manner if at all possible. Resilient - The system stays responsive in the face of failure. Elastic - The system stays responsive under varying workloadMessage driven - Reactive Systems rely onasynchronousmessage-passing to establish a boundary between components

Responsiveness is the cornerstone of usability and utility, but more than that, responsiveness means that problems may be detected quickly and dealt with effectivelyResilience is achieved by replication, containment,isolationanddelegationElastic Systems can react to changes in the input rate by increasing or decreasing theresourcesallocated to service these inputs.Message driven ensures loose coupling, isolation,location transparency, and provides the means to delegateerrors as messages


Questions And answers

Thank you