Cassandra and Riak at BestBuy.com

download Cassandra and Riak at BestBuy.com

If you can't read please download the document

Transcript of Cassandra and Riak at BestBuy.com

01/16/2014Cassandra and riak at bestbuy.com

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 Who We areBest Buy is the worlds largest multi-channel consumer electronics retailer, with stores in the United States, Canada, China and Mexico.We are the 10th largest online retailer in the United StatesMore than 1.6 billion visitors come to our stores and BestBuy.com each year.Our My Best Buy loyalty program is among the largest loyalty programs of its kind, with more than 40 million active members.We provide customers with outstanding choice, unbiased advice and unmatched support for the tech needs.

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014

2

A Unique Customer PromiseTHE LATEST DEVICES AND SERVICES, ALL IN ONE PLACEIMPARTIAL & KNOWLEDGEABLE ADVICE COMPETITIVE PRICES THE ABILITY TO SHOP WHEN AND WHERE YOU WANT SUPPORT FOR THE LIFE OF YOUR PRODUCTS

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 Joel Kannan crabb SwaminathanChief Architect, BestBuy.comDirector, Web Architecture

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 Nosql at Bestbuy.comSchema-less designKeyvalue systemsSparse column systemsNon-relational data Distributed natureEventual consistencyActive-Active across clouds and datacentersHigh reliabilityHorizontal scaling

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 BestBuy.com

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014

Near-InfiniteBursts

7X traffic spikesBursts > 50,000 rps#4 in eCommerce traffic during 2013 HolidayScalability goals

WalmartBest Buy

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 Flexibility Goals

Low cost of changeFast concepts to siteDaily releasesMultiple versions

One day of work vs. two months

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014

100% availabilityZero defects~ 2s response times

Achieved 100% cloud uptime during Holiday

Reliability goals

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 Cloud Architecture ConceptsClouds fail; plan for itMultiple availability zonesMultiple regionsMultiple vendorsDatacenter connections fail; plan for itServe pages completely from cloudBrowse-only fallback mode

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 CDN: Global Traffic ManagerBrowse Cloud

Vendor 1Browse Cloud

Vendor 2Best Buy Datacenter

Cloud Architecture Concepts

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014

riak

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 Product catalog - RequirementsBusiness RequirementsEasily add new attributes to productsUse existing product content feeding systemsProvide enhanced product data to all of Best BuyTechnical RequirementsScale to BestBuy.coms needsOne-way replication from DC to cloudProvide a Product Catalog API usable by Best Buy applications internally (DC) and externally (cloud)

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 Browse Cloud - Architecture

Product Data

Service AggregatorLoad BalancerWeb AppWeb App Persistent Cache

Product DataLegacy Services and Product DataDatacenter

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 Why riak for the product catalogKey-value StoreEasily add new attributesDifferent attribute sets for different product categoriesHub and Spoke ReplicationMultiple cloud instances can be out-of-synch for seconds or minutes; this is OKConnection to Datacenter can be lostResilience within all tiersRiaks ring architecture allows single instances to fail with little impact to the system

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 Why riak for the product catalogLegacy BridgeExtract data from legacy systemExpose to anyone who needs the dataExtends decision to retire legacy systemInput into Product EvolutionBest Buy is stretching Riak in multiple areasFeedback on Search and ReplicationDirect connection to Riak engineers

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 Product catalog deployment

Datacenter Master

Cloud Replicant

Cloud ReplicantEach Cloud Replicant is a self-contained instance which can intermittently lose connectivity to the Datacenter

V1.2.1V1.4.2

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 Product Catalog - Data FlowLegacy Product DataNoSQL Product DataGranularServiceAggregatorServiceWebApplication

NoSQL Product Data

DatacenterExtractionReplicationCloud

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 Riak Lessons learnedSearch eats up the CPU (Pre-Yokozuna)

Search Endpoint Removed

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 Riak lessons learnedMulti-datacenter replication is hard at scaleObject replication fails silently (v1.4.2)

4000 objects/sec

1.5 hours for partial update

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014

20

Did the product catalog succeed?Scale generally < 5 ms response times in cloudScope provides APIs in cloud and DC

4000 objects/sec1.5 hours for partial update

4 ms @ 95% Single zone failure, response times jump to 8-10 ms

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014

21

cassandra

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 Customer graph - RequirementsBusiness RequirementsCreate a single identity providerBuild an adaptive model to support a 360 degree view of the customer, customer segmentation and multi-channel personalizationExtensible framework to support federated identity interoperability with external providersTechnical RequirementsScale to BestBuy.coms needs with a distributed service oriented architectureSupport risk based authentication and secure service interfaces for consumers

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 customer graph - Architecture

Load BalancerAPIAPI

Load BalancerAPIAPI

Data Center 1 / Region 1Data Center n / Region nCDN + GTM + WAFTopology: NTSRF: 3Partitioner: Random

Cassandra Ring

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 customer graph - conceptualCreate Connection:POST /customers/Kannan/following/customers/Joel

Get all Kannans connections:GET /customers/Kannan/connections

Get Joels followers:GET customers/Joel/followers

Get customers connecting to a specific store:GET /stores/{store id}/connecting/customerSamsung 50 SMART TVKindle Fire 8.9 HDXlinksprefersmanagesusesprefersbuyslikesbuysprefersSamsung Galaxy IIIMy Best Buy Membership #KannanJoelfollowingBest Buy @ Minnetonka, MNBest Buy @ Eden Prairie, MNMy Wish ListcontainsXbox One

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 Customer graph conceptual

LoginItem BItem ClinksprefersmanagesusesprefersownsbuyslikesbuyslikesItem DBlogsprefersExternal Identity ProviderDevice ARewards #User AUser BfollowinglinkslikesreviewsauthorsStore BStore AHoliday List AcontainsItem AProductsTopics

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 Why cassandra?No single point of failureDelivers on Atomicity, Isolation & DurabilityEventual ConsistencyTunable Consistency for Reads vs. WritesLinear ScalabilityQuerying a column slice or a range of row keys Data can have expiration setReliable multiple data center replication

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 Lessons learned data modelingUse composite column names; static composite typesColumn names are stored physically sorted and indexedStore values as column namesWide rows in conjunction with composite columns can be used to build indices, butFor larger data sets, distribute the columns among rowsA secondary index is best modeled as a separate column family

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 Lessons learned data modelingDo not store an entity as a single column blobCannot index and query on entity attributesUpdates to an entity attribute would require a read and then a writeMutate just the required columns on an entity rowDo not read and then write

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 Lessons learned - monitoringHeap Size and UseGCMutation Stage (Writes) & Read Stage (Reads)Active and PendingAE Service, Stream & Message-Streaming-Pool StagesEspecially during scheduled repair or rebuild (while adding nodes to a new data center)Streaming requires keep-alive connections (watch out for firewalls terminating idle established connections; update periodic tcp keep-alive ping rate)Compaction Stage & Compaction Count IO Wait, Limits - nofiles, nproc

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 Implementation strategyStart with a project teamRiak Product CatalogCassandra Customer GraphBuild expertise in development and operationsAs usage grows, create a Platform teamCombine engineering into one teamProject team can then focus on business featuresFocus on automation

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 conclusionsRiak and Cassandra have both been successfulRiak No complete outages in two yearsCassandra Flawless in its first Holiday in 2013Differences Replication patterns are differentWrite and read treatment is differentDeployment pattern are differentHigh AvailabilityBoth have highly resilient architecturesBoth scale linearly

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014 ??

@Copyright 2014 Best Buy, Inc. All rights reserved.January 16, 2014