HBaseCon 2012 | HBase, the Use Case in eBay Cassini
-
Upload
cloudera-inc -
Category
Technology
-
view
3.185 -
download
0
description
Transcript of HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBasethe Use Case in eBay CassiniThomas PanPrincipal Software EngineereBay Marketplaces
eBay Marketplaces
97 millionactive buyers and sellers world wide
200+ million itemsin more than 50,000 categories
2 billion page viewseach day
9 petabytes of datain our Hadoop and Teradata clusters
250 million querieseach day to our search engine
CassinieBay’s new Search
EngineEntirely new codebase
World-class, from a world class team
Platform for ranking innovation
Four major tracks, 100+ engineers
Likely launch in 2012
Indexing in Cassini
Index with more data and more history
More computationally expensive work at index-time (and less at query-time)
Ability to rescore and reclassify entire site inventory
The entire site inventory is stored in HBase
Indexes are built via MapReduce jobs and stored in HDFS
Build the entire site inventory in hours
Hbase Table Data Import
Bulk Load Batch processing on demand or every couple of
hours Load a large amount of data quickly
PUT Near real time updates Better for updating small amount of data Read after PUT for better random read
performance
HBase Tables
3 major tables: active items, completed items and sellers
15TB data
3600 pre-split regions per table with auto-split disabled
3 column families with maximum 200 columns
Automatic major compaction disabled
RowKey is bit reversal of document id (unsigned 64-bit integer)
Indexing Job Pipeline
Full table scan
Run every couple of hours
Numbers
Data import Bulk data import: 30 minutes for 500 million full
rows Random write: ~ 200,000,000 rows per day 1.2 TB data daily import
Scan Performance Scan speed: 2004 rows per second per region
server (average version 3), 465 rows per second per region server (average version 10)
Scan speed with filters: 325~353 rows per second per region server
Operations
Monitoring Ganglia Nagios OpenTSDB
Testing Unit test and regression test
HBaseTestingUtility for unit test Standalone Hbase for regression test (mvn verify)
Cluster level Fault Injection Tests [HBASE-4925]
Region balancer
Manual major compaction
Operations (Cont’d)
Disable swap
Largely increase file descriptor limit and xciever count
Metrics Watch for
jvm.DataNode.metrics.threadRunnablewith netstat
Connection leakage
hbase.regionserver.compactionQueueSize
Major/minor compactions
dfs.datanode.blockReports_avg_time Data block reporting (for too many data blocks)
network_report Network bandwidth usage (for data locality)
Community Acknowledgement
Eli Collins
Kannan Muthukkaruppan
Karthik Ranganathan
Konstantin Shvachko
Lars George
Michael Stack
Ted Yu
Todd Lipcon