Cassandra at Bazaarvoice - EmoDB

47
Confidential and Proprietary. © 2013 Bazaarvoice, Inc. EmoDB Store your feelings here www.bazaarvoice.com

description

Introducing Bazaarvoice datastore (EmoDB) EmoDB is a RESTful HTTP server used by Bazaarvoice for storing JSON objects and for watching for changes to those events. It also supports a blob store, a queueing service, and a data bus to track events. It is designed to span multiple data centers, using eventual consistency (AP) and multi-master conflict resolution. It relies on Apache Cassandra for persistence and cross-data center replication. About Bazaarvoice Bazaarvoice is based on a simple truth - when people talk to each other, people buy stuff they are happy about because they trust the opinions of others. We see a day when all voices are connected and, together, help the marketplace function better. We’ve built a network that connects businesses together to amplify the authentic voices of people wherever they shop – online, in-store and mobile. Our mission, just like our name, is to be the "voice of the marketplace", one authentic conversation at a time. Each month, more than 450 million people view and share opinions, questions and experiences on more than 20 million products in the Bazaarvoice network. Our technology platform channels these voices into places that help consumers make purchasing decisions. Our engineers have the opportunities to work on many areas of computer science, including distributed computing, natural language processing, big data analytics, larget-scale system design, and user interface design just to name a few. We use the latest and greatest technologies in Cloud, NoSQL, advanced client side JavaScript etc to solve problems at a scale that few companies can offer. About Fahd Siddiqui: Fahd Siddiqui is a Senior Software Engineer at Bazaarvoice in the data infrastructure team. His interests include highly scalable, and distributed data systems. He holds a Master's degree in Computer Engineering from the University of Texas at Austin.

Transcript of Cassandra at Bazaarvoice - EmoDB

Page 1: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

EmoDBStore your feelings here

www.bazaarvoice.com

Page 2: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SaaS serving software that collects and displays user generated content, crunches analytics, and extracts insights.

Thousands of clients

Hundreds of millions of pieces of content

Hundreds of millions of unique visitors per month

Tens of billions of pageviews per month

Austin-based company founded in 2005

Austin San Francisco New YorkEngineering offices

Page 3: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Fahd SiddiquiSenior Software Engineer, Data Infrastructure

Bazaarvoice

linkedin.com/in/fahdsiddiqui

[email protected]

$ whoami

Page 4: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Global Monthly Unique Visitors

1B

1B

500M

1B

400M

200M

250M

450M

1B

600M

Page 5: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Monthly stats as of July 201316B1B

480M250M118M

3M40002500

Review impressions

Pageviews (37k rps)

Unique users

Products in catalog

Total reviews

Monthly new reviews

Customer implementations

Servers

95 Engineers

Page 6: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Data Infrastructure

Page 7: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Data Infrastructure

Page 8: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Data Infrastructure

Page 9: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Data Infrastructure

Page 10: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Data Infrastructure

Page 11: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Goals for EmoDB

Page 12: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Goals for EmoDB

Store in a flexible way about anything

Page 13: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Goals for EmoDB

Store in a flexible way about anything

Support “Universal Content Type” – store any content type without any re-architecture

Page 14: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Goals for EmoDB

Store in a flexible way about anything

Support “Universal Content Type” – store any content type without any re-architecture

Watch for changes to data events

Page 15: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Goals for EmoDB

Store in a flexible way about anything

Support “Universal Content Type” – store any content type without any re-architecture

Watch for changes to data events

Exposes RESTful API

Page 16: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Goals for EmoDB

Store in a flexible way about anything

Support “Universal Content Type” – store any content type without any re-architecture

Watch for changes to data events

Exposes RESTful API

Multi-master, multi-datacenter, fault tolerant, horizontal scale on r/w

Page 17: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

EmoDB Overview

System of RecordDatabusQueue ServiceBlob Store

….. Backed by Cassandra

Page 18: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR - tables

What is an Emo Table?It is a bucket that contains json document. Creating it is cheap, and you may create as many as you want e.g.., review:testcustomerOffers a way to fetch any particular row id, andComplete table scan – uses splits for parallel scans

Page 19: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR - tables

Create a table

$ curl -s -XPUT -H "Content-Type: application/json" \ "http://localhost:8080/sor/1/_table/review:testcustomer ?options=placement:'ugc_global:ugc'&audit=comment:'initial+provisioning',host:aws-tools-02" \ --data-binary '{"type":"review","client":"TestCustomer"}' | jsonpp{ "success": true}

• Store a document

$ curl -s -XPUT -H "Content-Type: application/json" \ http://localhost:8080/sor/1/review:testcustomer/demo1?audit=comment:'initial+submission',host:aws-submit-09 \ --data-binary '{"author":"Bob","title":"Best Ever!","rating":5}' | jsonpp{ "success": true}

Page 20: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR – rows

Row is composed of deltasWriters append deltas, and readers resolve deltas to produce a resolved objectCompaction occurs when data has been replicated to all data centersDue to this, EmoDb is not good for systems high update/create ratio

Page 21: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Data Access in EmoDB

3 ways to read data out of EmoDBLookup by primary keyBulk extract (scan) Change feed (using EmoDB databus)What’s missing?Where, join, group by, anything other than primary key lookupUse other indexing mechanism for complex queries (such as elasticsearch, solr, etc.)

Page 22: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR – Data storage challenges

Problem 1:Need a way to cheaply create 10’s of 1000’s of “tables”As of Cassandra 1.1, at least 1 MB of memory in every node for each CF is neededWay too much overhead to dedicate a CF for each user-defined table

Hint: We’ll use only one Column Family to store all tables

Page 23: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR – Data storage challenges

Problem 2 (once Problem 1 is solved):Need to scan entire table to be indexed by Polloi (Elasticsearch)Require a way to split tables into shards that enable sequential scanShards for each table should be fully distributed over Cassandra cluster

Page 24: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR – Data storage

Solution to both Problem 1 and 2:Row key byte buffer contains a 9 byte “table prefix”0 – 0: 8-bit shard identifier1 – 8: 64-bit table UUIDN-byte - UTF-8-encoded content keyShard identifier is determined byBottom 8 bits of 32-bit Murmur3 hash of (table UUID | content key)

Page 25: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR – Scanning Table

Shard identifier serves to spread content for a given table to avoid hotspots (using ByteOrderedPartitioner) All content for a table can be fetched in parallel using 2^8 = 256 range queriesThere you have it, a single CF offering range scans for segments (tables) that are fully distributed over the cluster !

Page 26: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR – Scanning Table

Table UUIDs also solved another problem for usMultiple tables can now be stored in the same CFSince we use UUID, it allows us to DROP tables, and CREATE with the same name.DROP’ed table deleted lazily – specially important in an eventually consistent world

Page 27: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR – Parallel Scan

Page 28: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR – Parallel Scan

Call getSplits() method to get a list of split identifiers

Then, in parallel, scan the data in each split by calling the getSplit() method

Java:Collection<String> getSplits(String table, int desiredRecordsPerSplit);Iterator<Map<String, Object>> getSplit(String table, String split, @Nullable String fromKeyExclusive, long limit, ReadConsistency consistency);

Page 29: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR – Parallel Scan

Java code sample

Page 30: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR - Deltas

Documents are stored as a sequence of deltasReaders evaluate deltas in order to produce document Create, update, and delete documents by creating deltasWeak consistency – no document level locking

Page 31: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR - Deltas

Page 32: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR - Deltas

Typically a replication conflict between t2 and t3 But since each delta specifies only the fields it modifies, the deltas merge together cleanly and produce the desired result.No cross-data center synchronous communication required for concurrent modification

Page 33: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR - Deltas

Recursive, pattern matching approachOperations available for:Setting a valueDeleting a valueUpdating a value for a key in a mapNo operation for modifying a listModel list using a map Time UUID is a good candidate for list keys

Page 34: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR - Deltas

Literal – “smash” operation

Delete Map

Page 35: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR - Deltas

ConditionalPerform a delta conditionally Designed to help resolve the most common concurrent write conflict situations Simple and reliableEg., Mark review “approved” only if moderation hasn’t begun

Page 36: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR – Deltas

Other types of conditionsEqual, Intrinsic, Is, Map, And, Or, Not, ConstantEg., {..,"type":or("product","category"),"client":"TestCustomer"}

Page 37: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR - Deltas

Read-Modify-WriteRead original stateCompute new versionThe write succeeds, or Eventually, the write conflicts, and databus fires an event for the application to detect it, and retry the write.

Page 38: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR - Deltas

Data center A• T1• Conditional T3

Data center B• T1• T2•

Page 39: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SoR - Deltas

Compaction For efficiency, older deltas get compacted and replaced by a single delta – a “compaction” record Ensures intrinsics like ~version, ~firstUpdateAt, etc. are maintained Compaction happens opportunistically, whenever documents are read

Page 40: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Databus

Allows applications to get notified of updates to SoRMust create a persistent subscription A table or multiple tables (based on value of attributes)SoR “DVR”s updates for all subscriptionsSupports multiple concurrent writers, and readers (polls and acks)No guarantees on orderTo help SoR provides ~version, and ~signatureExposes RESTful API

Page 41: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Databus – Subscription Management

Subscribe to changes to a set of tables in the System of Record

Table filters are the same as conditions for deltasFollow events on all tables for which the condition evaluates to trueTo subscribe to all tables in the SoR, omit the condition or pass ‘alwaysTrue’

Page 42: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Databus

Subscribe for multiple tables

Count events

Page 43: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Databus

Poll for eventsCheck for unclaimed, unacknowledged eventsIf events not ack’d, then they will return in another poll after claim period expires

Renew claims

Acknowledge Claims

Page 44: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Blob Store

REST storage service for photos.No single point of failure (data loss after 3 servers fail.)Sweet spot is blobs of a few MB, not GB (not designed for video.)Data replicates to all data centersExcept where replication is restricted by legalWhy not Amazon S3?Lower latency: reads & writes are always served out of the local data center.If you don't read cross-data center or you don't mind writing to buckets in multiple regions, use S3 or S3+Cloudfront.

Page 45: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Highly Scalable Architecture

We serve traffic out of three AWS

regions simultaneously

DNS Global Traffic Management sends user

requests to the fastest region

Application services are all auto-scaled

and self-healing

Our Cassandra-based EmoDB operations out of

multiple Availability Zones, so that an AZ failure

doesn’t result in downtime

Cassandra replicates across all three regions

Page 46: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Emo/Polloi Contributors

Aaron DixonAhaduzzaman MunnaDave BarceloFahd SiddiquiJohn RoeslerMark BrandtMatt BognerNate BauernfiendShawn SmithSteven Grotten

Page 47: Cassandra at Bazaarvoice - EmoDB

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

@Bazaarvoice

@BazaarvoiceDev

http://www.bazaarvoice.com/

http://blog.developer.bazaarvoice.com/

Learnmore