Post on 21-May-2015
ⓒ 2011 NHN CORPORATION
CUBRID Cluster Intro-duction
Dong Wang
Platform Development Lab
NHN China
2011.9.22
CUBRID Cluster Introduction
Agenda
□ Background & Goal
□ CUBRID Cluster basic concept
□ CUBRID Cluster general design
□ Result & Status of each milestone
□ Demo
□ Performance results
□ Pros and cons
□ Next version plan
2 / 35
CUBRID Cluster Introduction
Background & Goal
3 / 35
CUBRID Cluster Introduction
Background & Goal
□ Background
In Internet Portal Service, volume size of service data is increasing very fast
without deletion (Such as: Café service)
How to scale-out DB system without modification of applications?
• Big system power by cheap commodity servers – Clustering or grid computing
□ Goal
Support Dynamic Scalability
Location transparency to the applications
Volume size & Performance
• When performance is the same, Cluster can store more data
• When data size is the same, Cluster can provide higher performance
Others
• Global Schema, Distributed Partition, Load Balancing
• Cluster Management Node, Heart beat
4 / 35
CUBRID Cluster Introduction
Background & Goal(cont.)
As-Is
RORW
M
S
HA
DB1/UPDATE tbl01
DB3/UPDATE tbl35
DB1/SELECT tbl01
DB4/SELECT tbl47
DB1 DB2 DB3 DB4
DB system architecture is coded inthe application’s logicAP logic decides which SQL to which DB server
Provides “single DB view”/“multi access point”to applicationsDB system scale-out independently to applications(Linear Scalability)
To-Be
M
S
DB
UPDATE tbl01
SELECT tbl47
SELECT tbl01
UPDATE tbl35
SELECT tbl35
Global Schema
Distributed Partition
To do
5 / 35
CUBRID Cluster Introduction
CUBRID Cluster basic concept
6 / 35
CUBRID Cluster Introduction
CUBRID Cluster basic concept
Basic Features
• Global schema• Global database• Distributed partition• Global transaction• Dynamic scalability• Global serial & global index
Advanced Features
• Support HA• Cluster management node• Dead lock detection
To do
7 / 35
CUBRID Cluster Introduction
CUBRID Cluster basic concept– Global schema
Global Schema
Local Schema #4Local Schema #3Local Schema #2Local Schema #1
Database #1 Database #2 Database #3 Database #4
SELECT * FROM info, code WHERE info.id = code.id
INSERT INTO contents…
contents
code level
localcontents
SELECT * FROM contents WHERE auth = (SELECT name FROM author WHERE …)
The global schema is a single representation or a global view of all nodes where each node has its own database and schema.
author
author contents author contents author
local
contents author
info
infoinfo info info
8 /
CUBRID Cluster Introduction
CUBRID Cluster basic concept – Global database
Node #1 Node #2 Node #3
DB A DB B DB A DB C DB A DB D
Global DB A Global DB CLogical View
Physical View
Logical View
Physical View
⑵
⑴
⑴
⑶
⑶
⑵ ⒜
⒜ ⒝
DB C
⒝
9 /
Global database is a logical concept to represent database managed by the CUBRID Cluster system.
Distributed partition concept
CUBRID Cluster Introduction
Schema
Data
SystemCatalog
Index
Schema
DataSystemCatalog
Index
Global Schema
DataSystemCatalog
Index
Logical View Logical View
Physical ViewPhysical View
DB1 ON NODE #1
DB1 ON NODE #2
10 /
CUBRID Cluster Introduction
CUBRID Cluster basic concept -- others
□ Global Transaction
A global transaction will be divided into several local transactions which run on
different server nodes.
Global transaction makes sure that every server node in CUBRID Cluster is
consistent before or after the global transaction.
The process of global transaction is transparent to application.
□ Dynamic Scalability
Dynamic scalability allow user to extend or shrunk server nodes in CUBRID Cluster
without stop the CUBRID Cluster.
After the new server node is added in Cluster, user can access and query global
table from this new node.
11 / 35
CUBRID Cluster Introduction
CUBRID Cluster basic concept – User specs
□ Registering Local DB into Global DB (Cluster)
REGISTER NODE‘node1’ ‘10.34.64.64’;
REGISTER NODE‘node2’ ‘out-dev7’;
□ Creating Global Table/Global Partition table
CREATE GLOBAL TABLE gt1 (…) ON NODE‘node1’;
CREATE GLOBAL TABLE gt2 (id INT primate key, …) partition by hash (id) parti-
tions 2 ON NODE ‘node1’, ‘node2’;
□ DML operations (INSERT/SELECT/DELETE/UPDATE)
□ Dynamic Scalability
-- add a new server node in global database
REGISTER 'node3' '10.34.64.66';
-- adjust data to new server node
ALTER GLOBAL TABLE gt2 ADD PARTITIONPARTITIONS 1 ON NODE 'node3';
12 / 35
CUBRID Cluster Introduction
CUBRID Cluster general design
13 / 35
CUBRID Cluster Introduction
CUBRID Cluster general design (DDL/INSERT)
Broker
AP
Broker
AP
Server #1
Global DB1
CREATE GLOBAL TABLE gt1… PARTITION BY HASH ON NODE ‘Server1”, ‘Server2”, ‘Server3’, ‘Server4’;
Distributed partition
Global Schema
DB1
Server #2
DB1
Server #3
DB1
Server #4
DB1
INSERT INTO gt1 …
C2S communication
WorkspaceExtensionTo store Remote oid
14 / 35
CUBRID Cluster general design (SELECT/DELETE)
CUBRID Cluster Introduction
Broker
AP
Broker
AP
Server #1Remote scan
UPDATE..SECLET .. FROM gt1WHERE ….
Remote execution
DB1
Server #2
DB1
Server #3
DB1
Server #4
DB1
S2S communication
DELETE …
15 / 35
CUBRID Cluster Introduction
CUBRID Cluster general design (COMMIT)
Broker
AP
Broker
AP
10.34.64.64
INSERT gt1SELECT … FROM …
2 phase commitDB1
10.34.64.65
DB1
10.34.64.66
DB1
10.34.64.67
DB1
Coordinator
COMMIT
In-dex
Server1
Server2
Server3
Server4
0 2 3 5 1
Local:2
Local:3
Local:5
Local:1
Global index : 0x40430000
Participants
In-dex
Server1
Server2
Server3
Server4
0 1
16 / 35
CUBRID Cluster Introduction
CUBRID Cluster general design (dynamic scale-out)
Broker
AP
Broker
AP
Server #1
Sync up global schema
DB1
Server #2
DB1
Server #3
DB1
Server #4
DB1
Rehash
CREATE GLOBAL TABLE gt2… PARTITION BY HASH ON NODE ‘Server1”, ‘Server2”, ‘Server3’;REGISTER ‘Server 4’ ‘10.34.64.67’;
ALTER GLOBAL TABLE gt2 ADDPARTITION … ON NODE ‘Server 4’;
17 / 35
CUBRID Cluster Introduction
CUBRID Cluster general design (ORDER BY-Ongoing)
Broker
AP
Broker
AP
Server #1
SECLET .. FROM gt1Order by ….
Send remote query with order by
DB1
Server #2
DB1
Server #3
DB1
Server #4
DB1
Step3: Merge results from each server
scan
scan
scan
scan
sort sort sort sort
Step1:
Step2:
18 / 35
CUBRID Cluster Introduction
The result & status of each milestone
19 / 35
CUBRID Cluster Introduction
CUBRID Cluster Project Overview
□ Team Composition & Roles
Service Platform and Development Center, NHN Korea
• Architect: Park Kiun (Architect/SW)
Platform Development Lab, NHN China
• Project Manager : Baek Jeonghan (Director)/Li Chenglong (Team Leader)
• Dev leader: Li Chenglong (Team Leader) /Wang Dong (Part Leader)
□ Project Duration
May, 2010 ~ Oct, 2011
□ Quality requirement
Passed CUBRID all regression test cases;
Passed CUBRID Cluster all QA and dev functions test cases;
Passed QA Performance test cases;
□ Others:
Code based on CUBRID 8.3.0.0337 version (release verison)
20 / 35
The result & status of each milestones -- Overview
CUBRID Cluster Introduction
May Jun Jul Au
gSep Oct No
vDec Jan Feb Ma
r Apr May Jun Jul Au
gSep Oct No
vDec
2010 H2
2011 H1
2011 H2
2010 H1
M2 Distributed partition
M3 Performance
M4 nReport
Next Ver.
M1 Global Schema
M1:Start: May 24th, 2010End: Oct 20th, 2010 M2:
Start: Oct 21th, 2010End: Mar 25th, 2011
M3:Start: Mar 28th, 2011End: Jul 17th, 2011
M4 (Ongoing):Start: Jul 18th, 2011End: Oct 30th, 2011
21 / 35
CUBRID Cluster Introduction
The result & status of each milestones – M1
□ Achievements:
Open source on sf.net (including code, wiki, bts, forum)
General design for CUBRID Cluster
Implement global database
Implement system catalog extension and Support global table DDL
Support basic DML statement(insert/select/delete/update) for global table
Support s2s communication(server transfer data)
□ Others:
Source lines of code(LOC): 19246 (add 11358, del 817, mod 7071)
Add Chinese msg (LOC): 7507
BTS issues numbers: 178 issues
Check-in in subversion: 387 times
22 / 35
CUBRID Cluster Introduction
The result & status of each milestones – M2
□ Achievements:
Implement distributed partition table by hash (basic DDL and DML)
Support constraints(global index, primary key, unique), query with index
Support global serial
Support global transaction (commit, rollback)
Refactor s2s communication (add s2s communication interface and connection
pooling)
Support all SQL statements for café service
Passed QA functional testing
□ Others:
Source lines of code(LOC): 20242 (add 8670, del 4385, mod 7187)
BTS issues numbers: 241 issues
Fix QA bugs: 43 bugs
Check-in in subversion: 461 times
23 / 35
CUBRID Cluster Introduction
The result & status of each milestones – M3
□ Achievements:
Performance improvement for M2 (DDL, query, server side insert, 2PC)
Refactor global transaction, support savepoint and atomic statement
Implement dynamic scalability (register/unregister node, add/drop partition)
Support load/unloaddb, killtran
Others Features : (auto increment, global deadlock timeout)
Passed QA functional and performance testing
□ Others:
Source lines of code(LOC): 11518 (add 7065, del 1092, mod 3361)
BTS issues numbers: 165 issues
Fixed QA bugs: 52 bugs
Check-in in subversion: 461 times
24 / 35
CUBRID Cluster Introduction
The result & status of each milestones – M4 (Ongoing)
□ Goal:
Provide data storage engine for nReport Project
Performance improvement for order by and group by statement
Support big table join small table (global partition table join with non-partition ta-
ble)
25 / 35
CUBRID Cluster Introduction
Demo
26 / 35
CUBRID Cluster Introduction
Performance Results
□ Test environment
3 Server nodes (10.34.64.201/202/204):
• CPU : Intel(R) Xeon(R) CPU E5405
@2.00GHz
• Memory: 8G
• Network: 1000 Mbps
• OS: Center 5.5(64bit)
Configure:
data_buffer_pages=1,000,000
Table size: 100,000 and 10,000,000
rows
Data size: 108M (total 207M) and 9.6G
(total 30G)
Each thread runs 5000 times
Cluster Database
JAVA Pro-gram10.34.64.203
40 threads
Node110.34.64.204
Node310.34.64.202
Node210.34.64.201
14 threads
13 threads
13 threads
10.34.64.201CUBRID DB
JAVA Program 10.34.64.203
40 threads
CUBRID 8.3.0.0337
CUBRID Cluster M3
40 threads
27 / 35
CUBRID Cluster Introduction
Performance Results (cont.)
□ Create table statement:
Cluster M3:
• CREATE GLOBAL TABLE t1 (a INT, b INT, c INT, d CHAR(10),e CHAR(100),f CHAR(500),INDEX
i_t1_a(a),INDEX i_t1_b(b)) PARTITION BY HASH(a) PARTITIONS 256 ON NODE 'node1',
'node2', 'node3';
CUBRID R3.0:
• CREATE TABLE t1 (a INT, b INT, c INT, d CHAR(10),e CHAR(100),f CHAR(500),INDEX
i_t1_a(a),INDEX i_t1_b(b)) PARTITION BY HASH(a) PARTITIONS 256;
□ Test statements:
Select partition key column: SELECT * FROM t1 WHERE a = ?
Select non-partition key column: SELECT * FROM t1 WHERE b = ?
Select non-partition key column by range: SELECT * FROM t1 WHERE b BETWEEN ?
AND ?
Insert with auto commit: INSERT INTO T1 VALUES (?,?,?,?,?,?);
28 / 35
CUBRID Cluster Introduction
Performance Results (cont.)
□ TPS (Transactions Per Second) GraphSELECT * FROM t1 WHERE a = ? column a is indexed and partition key.
SELECT * FROM t1 WHERE b BETWEEN ? AND ? column b is indexed but not partitioned key.
INSERT INTO T1 VALUES (?,?,?,?,?,?)Auto commit
SELECT * FROM t1 WHERE b = ?column b is indexed but not partitioned key.
29 / 35
CUBRID Cluster Introduction
Performance Results (cont.)
□ ART (Average Response Time) Graph -- The lower the better
30 /
SELECT * FROM t1 WHERE a = ? column a is indexed and partition key.
SELECT * FROM t1 WHERE b BETWEEN ? AND ? column b is indexed but not partitioned key.
INSERT INTO T1 VALUES (?,?,?,?,?,?)Auto commit
SELECT * FROM t1 WHERE b = ?column b is indexed but not partitioned key.
30 / 35
CUBRID Cluster Introduction
Performance Results (cont.)
□ Test environment
Server nodes (10.34.64.49/50 …/58):
• CPU : Intel(R) Xeon(R) CPU E5645@
2.40GHz(12 core)
• Memory: 16G
• Network: 1000 Mbps
• OS: Center 5.5(64bit)
Configure:
• cubrid.conf:
data_buffer_pages=1,000,000
Table size: 100,000,000 rows (one
hundred million)
Data size: 88G (total size: 127G)
31 / 35
CUBRID Cluster Introduction
Performance Results (cont.)
32 / 35
CUBRID Cluster Introduction
Pros and cons
□ Pros
Current application can use CUBRID Cluster easily
CUBRID Cluster can store more data or provide higher performance than CUBRID
CUBRID Cluster is easy to scale-out data size
CUBRID Cluster can save cost
Support transaction
□ Cons
Not support join
Performance is not good enough yet
• S2S communication may led network cost
• 2PC will write many logs led IO cost
33 / 35
CUBRID Cluster Introduction
Next Version plan
□ Tentative Work plan
Performance improvement
Support HA for each server node in CUBRID Cluster
Support Load balance (write to active server/read from standby server)
Support distributed partition by range/list
Support global user
Others : backup/restore DB
34 / 35
CUBRID Cluster Introduction
Appendix
□ Why select partition key is not fast enough? (back)
35 /
Broker
AP
Server #1
SECLET .. FROM t1Where a = 100
DB1
Server #2
DB1
Step4: fetch back
SCAN
Step2: Send remote scan request
Step3: do scan
SECLET .. FROM t1__p__p2Where a = 100p2 stored on server2 Broker
AP
Server #1
SECLET .. FROM t1Where a = 100
DB1
Server #2
DB1
No remote scan here
SCAN
Step1: Send request to server2 directly
Step2: do scan
SECLET .. FROM t1__p__p2Where a = 100p2 stored on server2
Step1: Send request to server1 (default server)
35 / 35
CUBRID Cluster Introduction
Appendix (cont.)
□ Why insert is not fast enough? (BACK)
Broker
AP
Server #1
INSERT t1 (a, …)VALUES (100, …..);
DB1
Server #2
DB1Write log 3 times
2 phase commit
a=100 should be stored on server2
COMMIT
Write log 2 times
CUBRID Cluster Introduction
Broker
AP
Server #1
DB1
Server #2
DB1
No 2PC here
Write log1 time
INSERT t1 (a, …)VALUES (100, …..);
a=100 should be stored on server2
COMMIT
Dirty Dirty
36 / 35