Cassandra At Wize Commerce
-
Upload
eran-chinthaka-withana -
Category
Business
-
view
1.951 -
download
1
description
Transcript of Cassandra At Wize Commerce
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Eran Chinthaka [email protected]
CASSANDRA AT WIZE COMMERCE
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
About me
• Engineer in Platform and Infrastructure team at Wize Commerce (formerly Nextag)
• Member, PMC Member and a committer of Apache Software Foundation– Contributed to Web services project since 2004
• (in a different life) PhD in Computer Science from Indiana University, Bloomington, Indiana
• Today
2
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 3
In the next hour …
• Wize Commerce• Impact of Cassandra on Wize Commerce– Object Cache– Personalized Search
• Performance evaluation of Cassandra in a multi-data center and a read/write heavy environment
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
WIZE COMMERCE
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 5
About Wize Commerce
• Helping companies maximize their eCommerce investments – across every channel, device and digital ecosystem– an expertise we’ve honed for years with our eCommerce
customers– providing them with unmatched traffic and monetization
services at incredible scale
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 6
About Wize Commerce
• Scale of Wize Commerce– We drive over $1.1 Billion in annual worldwide sales – Shopping Network includes Nextag, guenstiger.de,
FanSnap, and Calibex– Each week, we manage
• 21 Million Keyword Searches • 105 Million Retargeted Ads• 140 Million Bot Crawls• 300 Million Facebook Ads• 700 Million Keywords• 560 Million Product SKUs• 1000s of Simultaneous A/B Test
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
CASSANDRA AT WIZE COMMERCE - CACHE
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Cache Architecture
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 9
Cache Architecture
• Multi-tiered read-through cache, optimized for performance• TTLs at upper levels to keep the data fresh• JMS based infrastructure to refresh objects on-demand
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 10
Cache - Expectations
• For each object– Less than 30ms 95th percentile read latency– Less than 1-hour of update latency with 30M updates
(phase 1, with existing components)– 10 minutes with eventing system integrated
• Fault tolerance• Low maintenance overheads• Ability to scale
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 11
Cache – Cassandra Integration
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 12
Cache – Cassandra Integration
• Replication factors to facilitate required number of copies per region
• Consistency level to suit business requirements• 6 multi-data center clusters with total nodes per cluster
ranging from 24 to 32• In house monitoring system for continuous monitoring and
escalations
DC1 DC2 DC3 DC4
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 13
Cache – Cassandra Integration
• Clients– Hector with DynamicLoadBalancing policy– Started experimenting with Astyanax
• Maintenance– Weekly repair and compaction tasks
• Monitoring– System health monitoring– End-to-end latency– Update latency
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Cache – Cassandra Integration
• Ring output of a clusterAddress DC Rack Status State Load Owns Token 148873535527910577765226390751398592512 xx.xx.xx.79 DC1 RAC1 Up Normal 90.19 GB 12.50% 0 xx.xx.xx.75 DC2 RAC1 Up Normal 51.15 GB 0.00% 1 xx.xx.xx.75 DC3 RAC1 Up Normal 126.62 GB 0.00% 2 xx.xx.xx.80 DC1 RAC1 Up Normal 88.57 GB 12.50% 21267647932558653966460912964485513216 xx.xx.xx.81 DC1 RAC1 Up Normal 89.82 GB 12.50% 42535295865117307932921825928971026432 xx.xx.xx.76 DC2 RAC1 Up Normal 51.1 GB 0.00% 42535295865117307932921825928971026433 xx.xx.xx.76 DC3 RAC1 Up Normal 124.49 GB 0.00% 42535295865117307932921825928971026434 xx.xx.xx.82 DC1 RAC1 Up Normal 85.78 GB 12.50% 63802943797675961899382738893456539648 xx.xx.xx.83 DC1 RAC1 Up Normal 84.34 GB 12.50% 85070591730234615865843651857942052864 xx.xx.xx.77 DC2 RAC1 Up Normal 49.34 GB 0.00% 85070591730234615865843651857942052865 xx.xx.xx.77 DC3 RAC1 Up Normal 123.54 GB 0.00% 85070591730234615865843651857942052866 xx.xx.xx.84 DC1 RAC1 Up Normal 82.94 GB 12.50% 106338239662793269832304564822427566080 xx.xx.xx.85 DC1 RAC1 Up Normal 83.1 GB 12.50% 127605887595351923798765477786913079296 xx.xx.xx.78 DC2 RAC1 Up Normal 47.98 GB 0.00% 127605887595351923798765477786913079297 xx.xx.xx.78 DC3 RAC1 Up Normal 121.25 GB 0.00% 127605887595351923798765477786913079298 xx.xx.xx.86 DC1 RAC1 Up Normal 83.41 GB 12.50% 148873535527910577765226390751398592512
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Cache – Cassandra Integration
• Column family stats of a clusterKeyspace: XXXX Read Count: 37060467 Read Latency: 3.0589244618800944 ms. Write Count: 37013052 Write Latency: 0.05114632081677566 ms. Pending Tasks: 0 Column Family: YYY SSTable count: 11 Space used (live): 71463479840 Space used (total): 71463479840 Number of Keys (estimate): 66231424 Memtable Columns Count: 314964 Memtable Data Size: 68140546 Memtable Switch Count: 628 Read Count: 37060467 Read Latency: 3.138 ms. Write Count: 37013052 Write Latency: 0.058 ms. Pending Tasks: 0 Bloom Filter False Postives: 10653 Bloom Filter False Ratio: 0.01611 Bloom Filter Space Used: 173770024 Key cache capacity: 60000000 Key cache size: 13309399 Key cache hit rate: 0.9210111414757199 Row cache: disabled Compacted row minimum size: 925 Compacted row maximum size: 8239 Compacted row mean size: 2488
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 16
Cache – Cassandra Datastore Performance
• 3-6ms average read latency across all objects in all data centers
• 15-20ms 95th percentile read latency• 30mins average update latency at 25M updates• Zero downtime even with multiple node failures
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 17
Cache – Snapshot of Live System
Median Read Latency
Objects Scrubbed in Last 24hrs
Scrubber Latency
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 18
Cassandra Integration – Lessons Learned
• Try to understand the internals, read code and find solutions on your own before getting into support requests– Assumption: you have adventurous engineers :D– Use IRC channels, user lists
• Never use RoundRobinLoadBalancingPolicy if you care about performance– DynamicLoadBalancingPolicy: based on the probability of failure of
node
• Divide keyspace within the datacenter and use token + 1 method in other data centers
• Experiment different configurations but make sure to have a quick fallback plan
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 19
Cassandra Integration – Lessons Learned
• Compaction are crucial for read/write heavy environment
• 24 x 7 automated monitoring and alerts– Read/write latencies , read misses and node status at least
• Consistency levels are important, if you expect node failures in a multi-data center environment
• Concentrate on key cache and forget about row cache if you have limited resources.– Rely on OS file cache
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 20
Cache: Future
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 21
Cache: Future
• Exposing Cache system using SOA based infrastructure– Thrift services enabling all cache accesses
• Event based updates– Event based pipeline for changes for system-of-record– Based on Storm (Twitter)
• Getting rid of Memcached
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
CASSANDRA AT WIZE COMMERCE – PERSONALIZED SEARCH
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 23
Personalized Search
• Aggregates user data from multiple data sources, e.g. site search, banner clicks.
• Uses statistical model to re-rank search results tailored to the user.
• Decomposes user information into model variables: brand preference, merchant preference, product category preference, etc.
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 24
Personalized Search – Cassandra Integration
• Serves 30-40MM banner ad impressions daily• Before: rely on user cookie (stores up to 4 weeks
data)• After: use user cookie for today's data, combined
with Cassandra Data Store to keep up to 3 months data
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
PERFORMANCE EVALUATION OF APACHE CASSANDRA IN A MULTI-DATA CENTER, READ/WRITE HEAVY ENVIRONMENT
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 26
Objectives
• Understand the limitations of Cassandra when deployed in a multi-data center environment
• Find out the best set of parameters that can be used and tuned to improve the performance
• Find out the limits of Cassandra cluster and for each version.
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 27
Objectives
• Understand its scalability characteristics with varying amount of operations per second– This will help us to understand how much of load
we can serve without causing any significant performance degradations.
• Understand the implications of node failures on its capability to efficiently serve data to client requests
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 28
Environment Setup
• Test Metrics– Operation Latency for a given throughput (set in the client)
• Average, Minimum, Maximum, 95th percentile
• Test Setup– Versions: Apache Cassandra 0.8.6 and 1.0.1– Node Distribution: 12-nodes distributed over three
geographically distributed data centers in US– Key Distribution: Keyspace is divided into four in each data
center and each node in the cluster is responsible for 1/4th of the keyspace
– Replication Factor: 3. Each datacenter has a copy of the data.
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 29
Environment Setup
• Hardware Setup– Dell R410
• 2 Quad-core with hyper-threading• 8 x 4GB RAM• PERC 6/i RAID Controller with 4 x 450GB and 15k RPM drives• GigE Network • CentOS 5.7
• Clients– Uses Yahoo Cloud Serving Benchmark (YCSB)– Two clients in each data-center, with a total of 6 clients– Records metrics at 10s intervals
• Every test case is independent of each other
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 30
Workload
• Read:Write ratio is 1:1.• Thread Counts: 256 from each client (total of 6 clients, 2 from
each data-center)– Contacts Cassandra nodes only in its own data-center (no cross data-center
traffic)
• Key Distribution: Zipfian• Record Count: 100 million• Total Operations Per Test Case: 1 million
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 31
Workload
• Target Operations per Second: Varies• Test Data– Columns per row: 10– Compacted row minimum size: 150– Compacted row maximum size: 1331– Compacted row mean size: 736
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 32
Test Cases
• Parameters varied in each test case– Apache Cassandra version: 0.8.6 vs 1.0.1– Concurrent read and write threads in a Cassandra node– Number of keys cached
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 33
Test Cases
Test Number
Description
1 Cassandra 0.8.6 binary as it is with no changes (Concurrent reads/writes = 32 and keys cached = 200k). base case for 0.8.6
2 Cassandra 0.8.6 with 64 concurrent reads and writes. Also keys cached is increased to 1 million.
3 Cassandra 0.8.6 with 64 concurrent reads and 32 concurrent writes. Also keys cached is increased to 1 million
4 Cassandra 1.0.1 with 64 concurrent reads and writes. Also keys cached is increased to 1 million.
5 Cassandra 1.0.1 with 64 concurrent reads and 32 concurrent writes. Also keys cached is increased to 1 million
6 Failure Test: Cassandra 1.0.1 with 64 concurrent reads and 64 concurrent writes. Also keys cached is increased to 1 million.
For each test case, we plot operations per second (varied from 3000 to 24000) vs read/write latency
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 34
Test Cases
• Failure Test• Brought down a node in east coast data center (DC2) and ran the test
varying the • Node going down has three implications on the latency.
• Our test clients timeout after 300 retries to connect to failed node.• Our nodes in DC2 will go to DC3 to serve data that are not
available in DC2 due to the node failure 3)• Our nodes in DC3 will have requests coming from the nodes of
DC2 putting more load on them
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 35
Results
Test Case 1: Varying OPS with Cassandra 0.8.6 Default Configuration
• read performance of default configuration is increasing beyond 25ms after 3000 OPS.
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 36
Results
Test Case 1: Varying OPS with Cassandra 0.8.6 Default Configuration
• Even though write performance is staying almost constant the poor read performance will be a concern with this configuration.
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 37
Results
Test Case 2: Varying OPS with Cassandra 0.8.6 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached)
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 38
Results
Test Case 2: Varying OPS with Cassandra 0.8.6 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached)
• even with good write performance, read performance after 12000 QPS is going beyond our threshold of 25ms
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 39
Results
Test Case 3: Varying OPS with Cassandra 0.8.6 and Custom Configuration (64 Concurrent Writes, 32 Concurrent Reads and 1 million keys cached)
• latency goes beyond 25ms after reaching 18000 OPS
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 40
Results
Test Case 3: Varying OPS with Cassandra 0.8.6 and Custom Configuration (64 Concurrent Writes, 32 Concurrent Reads and 1 million keys cached)
• better and consistent write performance
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 41
Results
Test Case 4: Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached)
• read performance has improved significantly and even at 24000 OPS it has stayed well below 10ms range.
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 42
Results
Test Case 4: Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached)
• better and consistent write performance
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 43
Results
Test Case 5: Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Writes, 32 Concurrent Reads, 1 million keys cached)
• a degradation of read performance compared to test case 4• latency goes beyond 25ms after reaching 21000 QPS.
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 44
Results
Test Case 5: Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Writes, 32 Concurrent Reads, 1 million keys cached)
• a degradation of read performance compared to test case 4• latency goes beyond 25ms after reaching 21000 QPS.
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 45
Results
Test Case 6: Failure Test - Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached)
• Node going down has three implications on the latency. • Our test clients timeout after 300 retries to connect to
failed node.• Our nodes in DC2 will go to DC3 to serve data that are not
available in DC2 due to the node failure 3)• Our nodes in DC3 will have requests coming from the
nodes of DC2 putting more load on them
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 46
Results
Test Case 6: Failure Test - Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached)
DC2
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 47
Results
Test Case 6: Failure Test - Varying QPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached)
DC2
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 48
Results
Test Case 6: Failure Test - Varying QPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached)
• increase in average latency in both DC2 and DC3 data centers but even with the node failure the latency has stayed below 25ms.
DC3
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 49
Results
Test Case 6: Failure Test - Varying QPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached)
• increase in average latency in both DC2 and DC3 data centers but even with the node failure the latency has stayed below 25ms.
DC3
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 50
Comparisons
Cassandra 0.8.6
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 51
Comparisons
Cassandra 0.8.6
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 52
Comparisons
Cassandra 1.0.1
• 64 concurrent reads and writes with 1 millions keys cached has performed significantly better than the other configurations in terms of read performance
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 53
Comparisons
Cassandra 1.0.1
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 54
Comparisons
Cassandra 1.0.1 vs 0.8.6 Average Read Performance Comparison
• 64 concurrent reads and writes with 1 millions keys cached has performed significantly better than the other configurations in terms of read performance
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 55
Comparisons
Cassandra 1.0.1 vs 0.8.6 95th Percentile Read Performance Comparison
• 64 concurrent reads and writes with 1 millions keys cached has performed significantly better than the other configurations in terms of read performance
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 56
Performance Evaluation: Conclusions
• Cassandra 1.0.1 with 64 concurrent reads and writes and with 1 millions keys cached we could serve 24000 operations per second under 15ms
• Node failure tests prove that in this configuration we can serve higher load in the cluster with less than 25ms
• Even the 95th percentile latency and 99th percentile numbers for this configuration is well within our expected limits
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 57
Excited about the work?
We’re hiring !!
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 58
Thank you !!
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012) 59
Questions !!(Presentation is available at http://goo.gl/Ba9o4)