Database sharding the right way: еasy, reliable, and open source (Esen Sagynov)
Transcript of Database sharding the right way: еasy, reliable, and open source (Esen Sagynov)
Database Sharding the Right Way: Easy, Reliable, and Open
source.
Esen Sagynov (@CUBRID),NHN CorporationService Platform Development Center
Monday, October 22, 2012
Who am I?
• Esen Sagynov (NHN Corp.)– CUBRID Project Manager–MVB at DZone
–@CUBRID– fb.com/cubrid– [email protected]
Growing in the Wild. The story by CUBRID Database
Developers.
Esen Sagynov (@CUBRID),NHN CorporationService Platform Development Center
Monday, April 2, 2012
Eugen Stoianovici,NHN CorporationCUBRID Development Lab
View on Slideshare
http://profyclub.ru/docs/439
We talked about…
• Who is NHN• Reasons behind CUBRID
development• What CUBRID has to offer. Benefits &
advantages.• What we have learnt so far. Where
we are heading to.
CUBRID Facts RDBMS True Open Source @ www.cubrid.org Optimized for Web services High performance Large DB support High-Availability feature DB Sharding support 90+% MySQL compatible SQL syntax + Oracle
statistics functions ACID Transactions Online Backup Supported by NHN Corporation
Q&A
• Используем ли мы внешние библиотеки, как STL или Boost, в CUBRID?
• Каким образом CUBRID будет выполнять сложные SQL запросы, поддерживать транзакции и ACID во множественных серверах в условиях горизонтального масштабирования?
=Big Business Opportunity
How to manage Big Data?
SQL
NoSQL- Open Source- Scalable- Non-standard API
- Enterprise- Vendor dependency- Scalability con-
straints- Common interface
Big Data = NoSQL?• Uses RDBMS with Sharding• Data is stored as simple Key-Value.
• Uses RDBMS with Sharding• Sharding and Replication is abstracted through Gizzard
• Uses RDBMS with Sharding• Hbase usage is limited
• Uses RDBMS to store data• Data caching in a variety of ways
• Uses RDBMS with Sharding• ACID is the reason to use RDBMS
• Uses RDBMS with Sharding• Easier to implement, best suits their needs
• Uses RDBMS with Sharding and HA• Data consistency and relationship are the reason
What NoSQL lacks?
SQL
Transactions
NoSQL => NoACID
Standard Interface
Experts
Oracle DB Market Share
2009 2010 201140%45%50%55%60%65%70%
KoreaWorldwide
DBMSMarket$MM
Worldwide 21,359 23,252 26,701 11.8%
Korea 349 395 478 17%
Ratio
1.6% 1.7% 1.8%
Source: Gartner, 2012
RDBMS is still the best choice formission-critical data
How to manage Big data with RDBMS?
Database Sharding
Database Sharding
• Partitioning
Divide the data between
multiple tables within one
Database Instance
• Sharding
Divide the data between
multiple tables created in
separate Database Instances
DB
X Y Z
DB
X
DB
Y
DB
Z
Shard
DB
“User” Ta-ble
id
name
age
1 Jackie
10
2 Bruce
12
3 Chuck
13
4 Billy 14
Shard #1
“User” Ta-ble
id
name
age
1 Jackie
10
2 Bruce
12
DB Sharding
Shard #2
“User” Ta-ble
id
name
age
3 Chuck
13
4 Billy 14
Database Sharding
Sharding SolutionsName Type Requirements Inter-
faceDB ETC
Hibernate shards AS framework
DBMS w/Hibernatesupport
- Hiber-nate
- JVMJava
dbShards AS & Middle-ware MySQL Java, C
Gizzard (Twitter) Middleware Any stor-age
- JVM Java
Spider for MySQL
Middleware &Storage Engine MySQL Any
CUBRID SHARD Middleware
- CUBRID
- MySQL- Oracle
Any
Disadvantages of Existing Solu-tions
• Third-party• Separate installation/hardware• Painful configuration and management• Additional dependency– RDBMS upgrade?– Sharding solution upgrade?
• Generating Unique ID?• Auto-rebalancing?• HA?
The Ideal Solution
• A single open source RDBMS which provides native support for scalability:– database sharding– connection pooling– load balancing– data auto-rebalancing– high-availability
CUBRID 9.0
Scalability with CUBRID
CUBRID 8.4.1connection poolingload balancinghigh-availability
CUBRID 9.0 (code name Apricot)database sharding
CUBRID next (code name Banana)data auto-rebalancing
Universal Sharding
• Supports:
coming soon…
Native DB Sharding
JDBC
User Apps
CCI
User Apps
CUBRID SHARD middleware
shard #0 shard #N……
CUBRID
How to install CUBRID SHARD?
Install CUBRID 9.0http://www.cubrid.org/downloads
CUBRID SHARD Configura-tion
• shard.conf– Default configuration file for CUBRID
SHARD.
• shard_connection.txt– Predefined list of shard IDs, database
and host names for CUBRID/MySQL.
• shard_keys.txt– A list of shard_key_columns and their
mapping with shard_id
shard.conf
SHARD_KEY_MODULAR = 256SHARD_KEY_LIBRARY_NAME = ‘’SHARD_KEY_FUNCTION_NAME = ‘’
shard_connection.txt
shard_keys.txt
Shard Key Column name id user_id order_no …
=
Custom Libraryint user_get_shard_key(int type, void *val){ int mod = 2;
if (val == NULL) {
return ERROR_ON_ARGUMENT; }
switch(type) {
case SHARD_U_TYPE_INT:{ int ival; ival = (int) (*(int *)val); return ival % 2;} break;case SHARD_U_TYPE_STRING: return ERROR_ON_MAKE_SHARD_KEY;default: return ERROR_ON_ARGUMENT;
} return ERROR_ON_MAKE_SHARD_KEY;}
Configuring CUBRID SHARD is very easy!
#1) Create Shards• Host 1..N:
$> cubrid createdb shard1$> csql -S -u dba shard1 -c "create user shard password 'shard123’”$> cubrid server start shard1
#2) Create same tables
$> csql -C -u shard -p 'shard123' shard1@localhost -c ”CREATE TABLE users (id BIGINT PRIMARY KEY, name VARCHAR(20), age SMALLINT)”
• Host 1..N:
#3) Start CUBRID SHARD
$> cubrid shard start@ cubrid shard start ++cubrid shard start: success
Standard Connection URL
connectionURL ="jdbc:cubrid:localhost:45511:shard1:shard:shard123:";
DB name
password
Shard broker port
username
String query = "SELECT name FROM student WHERE student_no = /*+ shard_key */ ?; ";PrepareStatement query_stmt = connection.prepareStatement(query);query_stmt.setInt(1,100);ResultSet rs = query_stmt.executeQuery();// fetch resultset
2. Query analysis3. shard_key hashing4. Passing the query to the selected shard.Shard
proxy
shard #0 shard #1 shard #2 shard #4
key_columnrange
(hash result) shard_idmin max
student_no 0 63 0
student_no 64 127 1
student_no 128 191 2
student_no 192 255 3
Shard selection
1. Execute queryClient app
Querying Shards
SELECT name FROM student WHERE student_no = /*+ shard_key */ ?;
SQL hint
Shard key column
• bind variable• fixed value
Types of SQL Hints
SQL Hints Description
/*+ shard_key */ a hint to specify the location of - a bind variable - or the literal valuewhich corresponds to the shard key column
/*+ shard_val(value) */
a hint to explicitly specify the shard key in case thecolumn that corresponds to the shard key does notexist in a query
/*+ shard_id(shard_id) */
A hints which can be used to directly processuser queries on a particular shard
How did we tackle theunique ID problem?
Generating Unique IDs
• DB Ticket Server– SERIAL (max = 1037 for ascending serial)– Sortable– 64 bits (BIGINT)– SERIAL CACHE (ON/OFF) for improved performance– Auto fail-over through HA
Other ways
• Create a generic table with BIGINT AUTO_INCREMENT PRIMARY KEY column (also “fail-over”-able)
• Generate IDs in Web app• Dedicated Service like Twitter’s
Snowflake
Does CUBRID SHARD support HA?
Yes!
CUBRID SHARD in HA
CUBRID SHARD middle-ware
shard #0
Client app
shard #1 shard #2 shard #3
Master
Slave
shard #0 shard #1 shard #2 shard #3
= =sync / async / semi-sync + auto fail-over
…. ….Client app Client app
CUBRID SHARD Performance
nGrinderCon-
troller
nGrinderAgent
nGrinderAgent
nGrinderAgent
nDrive App
Simulator(DBGW
lib)
nDrive App
Simulator(DBGW
lib)
nDrive App
Simulator(DBGW
lib)
http
ShardProxy
Broker
Meta DB 1
Meta DB 2
Meta DB 3
Meta DB 4
User DB
CAS- CUBCCI - CAS
8 agents and simu-
lators
Description Quantity OS (64bit) / CPU / MEM
Agent to generatload andNDrive App Simula-tor
8 Centos5.3 / xeon 2G-8core / 8G
CUBRID Shard 1 Centos5.3 / xeon 2.27G-16core / 24G
CUBRID Broker 1 Centos5.3 / xeon 2.27G-16core / 24G
Meta DB 4 Centos5.x / xeon 2.33G-4core / 8G
User DB 1 Centos5.3 / xeon 2.5G-8core / 8G
Meta DB
User DB
Broker
Shard Proxy
nGrinderAgent
nDrive App
Simulator
Performance Test ScenarioTest environment and hardware configurations
Test scenario #1: Maximum performance test
- Check if it can handle the load of at least 60,000 RPS
RPS = Request Per Second (Throughput)Vuser = # of concurrent users 32 64 96 128 160 192 256 320 384 448 512
020000400006000080000
100000
Load Generator Per-formance
# of concurrent users
RP
S
Increase in Vuser leads to in-crease:- of response time- of CPU Load on Proxy
hardware
After 320 Vuser, RPS de-creases- Proxy server load
(CPU load of the Simulator hard-ware ismaintained under 40%)
64 128 192 256 320010203040506070
0
10000
20000
30000
40000
50000
60000
Performance trend when load is increased
proxy cpu Mean Time(ms)RPS metadb TPS
Max performance conditions- 256 Vuser- 13981.98 RPS- 4 Proxy processes => 200 CAS(If # of Proxy processes is in-creased, performance can be in-creased)
Test Results (1)
Test scenario #2: Performance results compar-ison when: - SHARD is used - is NOT used
- When SHARD is not used, Broker is used.
Test Results (2)
64 128 192 256 3200
10
20
30
40
50
60
70
80
90
0
2000
4000
6000
8000
10000
12000
14000
16000
SHARD vs. Broker Performance Comparison
Broker - broker cpuShard - proxy CPUBroker - Mean Time(ms)Shard - Mean Time(ms)Broker - RPSShard - RPS
Vuser
RP
S
- Similar performance until 128 Vuser- When SHARD is not used, 128 Vuser
is maximum- In SHARD usage case, when # of Vuser is
increase- maximum performance can be
achieved as well as shorter re-sponse time andlower CPU utilization.
TPC-C Performance Test
Test Scenario
• 9 tables– customer (300000 records)– district (100 records)– history (300000 records)– item (100000 records)– new_order (0 records)– order_line (3000000 records)– orders (300000 records)– stock (1000000 records)– warehouse (10 records)
• The total number of records > 5 million.
• AWS Xlarge in-stance• 7GB RAM• 20 EC2 units
• Ubuntu 12.04 64-bit
• CUBRID 9.0 (beta)• MySQL 5.5.28
• Buffer• 2.8GB
data_buffer_size• 2.8GB
innodb_pool_size
• Default configura-tions
Test Results
TPC-C Index30
34
38
42
46
42.66
44.18
MySQLCUBRID
CUBRID SHARD is very stableand easy to use!
CUBRID SHARD Features
Single database view No application change Unlimited DB shards Multiple Sharding
Strategies Parameterized queries Shard targeted query
(SQL Hints) Shared Query Plan
Caching
Connection and statement pooling
Load balancing Read only Sharding Broker
1+ Sharding Brokers/Proxies
Native HA support Master-Slave DB
Generic (non-sharded) Tables
Supports CUBRID and MySQL
CUBRID SHARD
• Easy– No configuration hassle– No “moving parts”
• Reliable– High performance– Auto fail-over
• Open source– Supported by NHN
Big data is easy!
What’s next for CUBRID?
CUBRID vNext (Banana)
Auto-rebalancing in CUBRID SHARDPerformance+++SQL Compatibility+++SQL Monitoring++
Esen SagynovCUBRID Project Manager
www.cubrid.org
www.facebook.com/cubrid www.twitter.com/cubrid
CUBRID Q&A www.cubrid.org/questions