Webinar: Couchbase for Mobile - Introduction to Couchbase Lite
eBay: From here to there: our journey to 1000s of nodes – Couchbase Connect 2016
Transcript of eBay: From here to there: our journey to 1000s of nodes – Couchbase Connect 2016
Feng QuSr MTS eBay Database Infrastructure
From here to there:Our journey to 1000s of nodes
Couchbase Connect 2016
Couchbase Connect 2016
Feng Qu - Sr MTS in eBay DBA Team
• Have worked on Oracle since early 1990s• Have worked on Cassandra, MongoDB and
Couchbase since 2011• Led company wide NoSQL projects• 2014 and 2015 DataStax Cassandra MVP• Speaker at 2013, 2014 and 2015 Cassandra Sumit• Speaker at EDW 2016• Speaker at NoCoug 2016
Couchbase Connect 2016
eBay At A Glance
Active Listings
1.1B
Active Users
164M
Total DB Calls
610B/day
Y-o-Y Growth
30%-35%
Total DB Servers
4000+
Peak DB Calls
15M/sec
RDBMS Calls
500B/day
NoSQL Calls
110B/day
Couchbase Connect 2016
Challenges of Traditional RDBMS
• Challenges• Performance penalty to maintain ACID features• Lack of native sharding and replication features• Cost of software/hardware• Higher cost of commit
Couchbase Connect 2016
Different Databases Serve Different Purposes
Couchbase Connect 2016
NoSQL Databases Pros and Cons•Geo distributed replication & sharding•Location aware low latency query performance•Workload & access pattern optimized•Linear scalability with reduced disruption to business•Supports semi-structured or un-structured data•Flexible schema provides significant increase in Dev agility
•Lack strict ACID compliant transaction•Lack strong data model control & governance•Not suitable for ad-hoc workload & random access pattern•Requires change of mindset, ecosystem and infrastructure•Rapidly changing technology & competitive landscape•Requires Dev expertise in nuances of distributed system
Couchbase Connect 2016
MongoDB Pros and Cons
• Dev friendly rich JSON document model• Secondary index enables mixed access patterns• High business value (semi-) structure data• Balanced scale-out reads & writes (with optional sharding)• Straightforward admin effort
• Short write interruption during primary re-election• Not suitable for nanosecond latency writing• Potentially high TCO for large scale sharded cluster• Lack resource isolation
Couchbase Connect 2016
Cassandra Pros and Cons
• Peer-to-peer without SPOF (Single Point of Failure)• Active-active cross Datacenter• High read & very high write performance• Absolute linear scalability
• Inefficient secondary index (pre-V3)• Not suitable for mixed user query & access patterns• High compaction overhead for frequent random deletes• Require JVM tuning to mitigate GC pauses• Lack resource isolation• Slow cluster rebalancing
Couchbase Connect 2016
Couchbase Pros and Cons• Memcached compatible persistent document store• Peer-to-peer architecture• High read & write performance• Active-active cluster replication• Strong local cluster RW consistency• Resource isolation
• Short write interruption during node failover• Counter intuitive cross DC write conflict resolution (pre V4.6)• Slow cluster rebalancing• Slow warm-up
Couchbase Connect 2016
NoSQL Footprints at eBay
Besides Oracle/MySQL, we also have• Cassandra• Couchbase• MongoDB• HBase• Memcached• Neo4j• OpenTSDB• Redis• …
Couchbase Connect 2016
Why Couchbase?
• Memcached compatible persistent caching• Elastic scalability• High RW performance & throughput• Active-active bi-directional XDCR• High local cluster RW consistency• Flexible document model • Development agility• SQL integration• And more…
Couchbase Connect 2016
Environment• Support both dedicated & multi-tenant clusters• Couchbase Enterprise 3.1/4.5 running on BM & VM
• High I/O flavor• High memory flavor• High storage flavor
• Customized RPM• Customized to suit for eBay environment• Easy to install/upgrade, easy to maintain and ensure deployment
consistency across board and easy to identify deployment difference• Built in pre-defined tuning parameters when needed
• Homegrown client wrapper for central application logging and reporting• QA/LnP/PreProd/Prod environments
Couchbase Connect 2016
Couchbase Onboarding Process
Understand product limitation- Avoid known anti-pattern and look beyond generic use case
NoSQL product evaluation & selection- Business & Technology perspective- Product selection flowchart & detailed scoring card
Data modeling, POC with LnP, failover & DR testing- Review test result, re-evaluate initial assumptions
Capacity planning and provisioning
Couchbase Connect 2016
eBay Couchbase At A Glance
Total Clusters
120
Total Servers
1400
Couchbase Calls
80B/day
Y-o-Y Growth
>100%
Total Data Size
90TB
Total Documents
60B
Peak Sets/Cluster
800,000/sec
Peak Gets/Cluster
1,200,000/sec
Couchbase Connect 2016
Typical Use Cases• Write Intensive
• user session tracking• 13 billion writes per day
• Read Intensive• email notification
• 4 billion reads per day• Mixed workload
• Central monitoring platform where metrics collected for hundreds of thousands of devices real time
• 2 billion writes per day• 10 billion reads per day
Couchbase Connect 2016
Global User Preference
• Global repository with streamlined service to managing world-wide user preferences which come from Data Warehouse
• Seller advertising• Member communication• User account setting• Notification
preferences, etc.
Couchbase Connect 2016
Central Monitoring and Alerting
Entire eBay site monitoring system is built on Couchbase!
• We have 2 set of clusters(active/passive) A(3 DC) and B(3 DC) for upgrade/patch
Couchbase Connect 2016
Elastic Scalability
• Benchmarking • performance baseline for new hardware, new software release• Enforce full scale testing in dedicated LnP env before going to production
• In general, scale out by adding more nodes to increase throughput or reduce latency• Sometimes, it’s cost-efficient to scale up at component level by Identifying scaling
bottleneck, then resolve it accordingly• Scale up(vertical)
• Smaller data center footprint, such as space, power, cooling• Less license cost
• Scale out(horizontal)• Cheaper using commodity hardware• More fault tolerant• (Unlimited) upgradability
Couchbase Connect 2016
Couchbase Learning Experience• Lack of always available writes
• Application option to write to remote DC when local write fails• Cross DC update conflict resolution
• Unpredictable behavior but new features in v4.6 solve this• Metadata memory overhead
• 56 bytes metadata is too much if you have a small key• Memory fragmentation
• CB 4.x replaces TCMalloc with jemalloc libraries • Slow rebalance
• Using swap rebalance when applicable• Slow warm-up
• Remove access log to speed up warm up• 10 bucket limits not working well for shared QA env
Couchbase Connect 2016
Couchbase Wish List
• We like to see • Point-in-time recovery so we can store SOR data• Global admin console to manage multi clusters• Smaller meta data to reduce memory requirement• Robust rebalance • Lazy warmup so failed node can join quicker• Simplified XDCR/Compaction tuning• One log, just one log
Couchbase Connect 2016
Questions ?
eBay is hiring experienced NoSQL professionals, please send resume to [email protected]