CUBRID Cluster Introduction

ⓒ 2011 NHN CORPORATION

CUBRID Cluster Intro-duction

Dong Wang

Platform Development Lab

NHN China

2011.9.22

CUBRID Cluster Introduction

Agenda

□ Background & Goal

□ CUBRID Cluster basic concept

□ CUBRID Cluster general design

□ Result & Status of each milestone

□ Demo

□ Performance results

□ Pros and cons

□ Next version plan

2 / 35


Background & Goal

3 / 35


Background & Goal

□ Background

In Internet Portal Service, volume size of service data is increasing very fast

without deletion (Such as: Café service)

How to scale-out DB system without modification of applications?

• Big system power by cheap commodity servers – Clustering or grid computing

□ Goal

Support Dynamic Scalability

Location transparency to the applications

Volume size & Performance

• When performance is the same, Cluster can store more data

• When data size is the same, Cluster can provide higher performance

Others

• Global Schema, Distributed Partition, Load Balancing

• Cluster Management Node, Heart beat

4 / 35


Background & Goal(cont.)

As-Is

RORW

M

S

HA

DB1/UPDATE tbl01

DB3/UPDATE tbl35

DB1/SELECT tbl01

DB4/SELECT tbl47

DB1 DB2 DB3 DB4

DB system architecture is coded inthe application’s logicAP logic decides which SQL to which DB server

Provides “single DB view”/“multi access point”to applicationsDB system scale-out independently to applications(Linear Scalability)

To-Be

M

S

DB

UPDATE tbl01

SELECT tbl47

SELECT tbl01

UPDATE tbl35

SELECT tbl35

Global Schema

Distributed Partition

To do

5 / 35


CUBRID Cluster basic concept

6 / 35


CUBRID Cluster basic concept

Basic Features

• Global schema• Global database• Distributed partition• Global transaction• Dynamic scalability• Global serial & global index

Advanced Features

• Support HA• Cluster management node• Dead lock detection

To do

7 / 35


CUBRID Cluster basic concept– Global schema

Global Schema

Local Schema #4Local Schema #3Local Schema #2Local Schema #1

Database #1 Database #2 Database #3 Database #4

SELECT * FROM info, code WHERE info.id = code.id

INSERT INTO contents…

contents

code level

localcontents

SELECT * FROM contents WHERE auth = (SELECT name FROM author WHERE …)

The global schema is a single representation or a global view of all nodes where each node has its own database and schema.

author

author contents author contents author

local

contents author

info

infoinfo info info

8 /


CUBRID Cluster basic concept – Global database

Node #1 Node #2 Node #3

DB A DB B DB A DB C DB A DB D

Global DB A Global DB CLogical View

Physical View

Logical View

Physical View

⑵

⑴

⑴

⑶

⑶

⑵ ⒜

⒜ ⒝

DB C

⒝

9 /

Global database is a logical concept to represent database managed by the CUBRID Cluster system.

Distributed partition concept


Schema

Data

SystemCatalog

Index

Schema

DataSystemCatalog

Index

Global Schema

DataSystemCatalog

Index

Logical View Logical View

Physical ViewPhysical View

DB1 ON NODE #1

DB1 ON NODE #2

10 /


CUBRID Cluster basic concept -- others

□ Global Transaction

A global transaction will be divided into several local transactions which run on

different server nodes.

Global transaction makes sure that every server node in CUBRID Cluster is

consistent before or after the global transaction.

The process of global transaction is transparent to application.

□ Dynamic Scalability

Dynamic scalability allow user to extend or shrunk server nodes in CUBRID Cluster

without stop the CUBRID Cluster.

After the new server node is added in Cluster, user can access and query global

table from this new node.

11 / 35


CUBRID Cluster basic concept – User specs

□ Registering Local DB into Global DB (Cluster)

REGISTER NODE‘node1’ ‘10.34.64.64’;

REGISTER NODE‘node2’ ‘out-dev7’;

□ Creating Global Table/Global Partition table

CREATE GLOBAL TABLE gt1 (…) ON NODE‘node1’;

CREATE GLOBAL TABLE gt2 (id INT primate key, …) partition by hash (id) parti-

tions 2 ON NODE ‘node1’, ‘node2’;

□ DML operations (INSERT/SELECT/DELETE/UPDATE)

□ Dynamic Scalability

-- add a new server node in global database

REGISTER 'node3' '10.34.64.66';

-- adjust data to new server node

ALTER GLOBAL TABLE gt2 ADD PARTITIONPARTITIONS 1 ON NODE 'node3';

12 / 35


CUBRID Cluster general design

13 / 35


CUBRID Cluster general design (DDL/INSERT)

Broker

AP

Broker

AP

Server #1

Global DB1

CREATE GLOBAL TABLE gt1… PARTITION BY HASH ON NODE ‘Server1”, ‘Server2”, ‘Server3’, ‘Server4’;

Distributed partition

Global Schema

DB1

Server #2

DB1

Server #3

DB1

Server #4

DB1

INSERT INTO gt1 …

C2S communication

WorkspaceExtensionTo store Remote oid

14 / 35

CUBRID Cluster general design (SELECT/DELETE)


Broker

AP

Broker

AP

Server #1Remote scan

UPDATE..SECLET .. FROM gt1WHERE ….

Remote execution

DB1

Server #2

DB1

Server #3

DB1

Server #4

DB1

S2S communication

DELETE …

15 / 35


CUBRID Cluster general design (COMMIT)

Broker

AP

Broker

AP

10.34.64.64

INSERT gt1SELECT … FROM …

2 phase commitDB1

10.34.64.65

DB1

10.34.64.66

DB1

10.34.64.67

DB1

Coordinator

COMMIT

In-dex

Server1

Server2

Server3

Server4

0 2 3 5 1

　　　　　Local:2

Local:3

Local:5

Local:1

Global index : 0x40430000

Participants

In-dex

Server1

Server2

Server3

Server4

0 1

　　　　　

16 / 35


CUBRID Cluster general design (dynamic scale-out)

Broker

AP

Broker

AP

Server #1

Sync up global schema

DB1

Server #2

DB1

Server #3

DB1

Server #4

DB1

Rehash

CREATE GLOBAL TABLE gt2… PARTITION BY HASH ON NODE ‘Server1”, ‘Server2”, ‘Server3’;REGISTER ‘Server 4’ ‘10.34.64.67’;

ALTER GLOBAL TABLE gt2 ADDPARTITION … ON NODE ‘Server 4’;

17 / 35


CUBRID Cluster general design (ORDER BY-Ongoing)

Broker

AP

Broker

AP

Server #1

SECLET .. FROM gt1Order by ….

Send remote query with order by

DB1

Server #2

DB1

Server #3

DB1

Server #4

DB1

Step3: Merge results from each server

scan

scan

scan

scan

sort sort sort sort

Step1:

Step2:

18 / 35


The result & status of each milestone

19 / 35


CUBRID Cluster Project Overview

□ Team Composition & Roles

Service Platform and Development Center, NHN Korea

• Architect: Park Kiun (Architect/SW)

Platform Development Lab, NHN China

• Project Manager : Baek Jeonghan (Director)/Li Chenglong (Team Leader)

• Dev leader: Li Chenglong (Team Leader) /Wang Dong (Part Leader)

□ Project Duration

May, 2010 ～　 Oct, 2011

□ Quality requirement

Passed CUBRID all regression test cases;

Passed CUBRID Cluster all QA and dev functions test cases;

Passed QA Performance test cases;

□ Others:

Code based on CUBRID 8.3.0.0337 version (release verison)

20 / 35

The result & status of each milestones -- Overview


May Jun Jul Au

gSep Oct No

vDec Jan Feb Ma

r Apr May Jun Jul Au

gSep Oct No

vDec

2010 H2

2011 H1

2011 H2

2010 H1

M2 Distributed partition

M3 Performance

M4 nReport

Next Ver.

M1 Global Schema

M1:Start: May 24th, 2010End: Oct 20th, 2010 M2:

Start: Oct 21th, 2010End: Mar 25th, 2011

M3:Start: Mar 28th, 2011End: Jul 17th, 2011

M4 (Ongoing):Start: Jul 18th, 2011End: Oct 30th, 2011

21 / 35


The result & status of each milestones – M1

□ Achievements:

Open source on sf.net (including code, wiki, bts, forum)

General design for CUBRID Cluster

Implement global database

Implement system catalog extension and Support global table DDL

Support basic DML statement(insert/select/delete/update) for global table

Support s2s communication(server transfer data)

□ Others:

Source lines of code(LOC): 19246 (add 11358, del 817, mod 7071)

Add Chinese msg (LOC): 7507

BTS issues numbers: 178 issues

Check-in in subversion: 387 times

22 / 35



□ Achievements:

Implement distributed partition table by hash (basic DDL and DML)

Support constraints(global index, primary key, unique), query with index

Support global serial

Support global transaction (commit, rollback)

Refactor s2s communication (add s2s communication interface and connection

pooling)

Support all SQL statements for café service

Passed QA functional testing

□ Others:



Fix QA bugs: 43 bugs


23 / 35



□ Achievements:

Performance improvement for M2 (DDL, query, server side insert, 2PC)

Refactor global transaction, support savepoint and atomic statement

Implement dynamic scalability (register/unregister node, add/drop partition)

Support load/unloaddb, killtran

Others Features : (auto increment, global deadlock timeout)

Passed QA functional and performance testing

□ Others:



Fixed QA bugs: 52 bugs


24 / 35


The result & status of each milestones – M4 (Ongoing)

□ Goal:

Provide data storage engine for nReport Project

Performance improvement for order by and group by statement

Support big table join small table (global partition table join with non-partition ta-

ble)

25 / 35


Demo

26 / 35


Performance Results

□ Test environment

3 Server nodes (10.34.64.201/202/204):

• CPU : Intel(R) Xeon(R) CPU E5405

@2.00GHz

• Memory: 8G

• Network: 1000 Mbps

• OS: Center 5.5(64bit)

Configure:

data_buffer_pages=1,000,000

Table size: 100,000 and 10,000,000

rows

Data size: 108M (total 207M) and 9.6G

(total 30G)

Each thread runs 5000 times

Cluster Database

JAVA Pro-gram10.34.64.203

40 threads

Node110.34.64.204

Node310.34.64.202

Node210.34.64.201

14 threads

13 threads

13 threads

10.34.64.201CUBRID DB

JAVA Program 10.34.64.203

40 threads

CUBRID 8.3.0.0337

CUBRID Cluster M3

40 threads

27 / 35


Performance Results (cont.)

□ Create table statement:

Cluster M3:

• CREATE GLOBAL TABLE t1 (a INT, b INT, c INT, d CHAR(10),e CHAR(100),f CHAR(500),INDEX

i_t1_a(a),INDEX i_t1_b(b)) PARTITION BY HASH(a) PARTITIONS 256 ON NODE 'node1',

'node2', 'node3';

CUBRID R3.0:

• CREATE TABLE t1 (a INT, b INT, c INT, d CHAR(10),e CHAR(100),f CHAR(500),INDEX

i_t1_a(a),INDEX i_t1_b(b)) PARTITION BY HASH(a) PARTITIONS 256;

□ Test statements:

Select partition key column: SELECT * FROM t1 WHERE a = ?

Select non-partition key column: SELECT * FROM t1 WHERE b = ?

Select non-partition key column by range: SELECT * FROM t1 WHERE b BETWEEN ?

AND ?

Insert with auto commit: INSERT INTO T1 VALUES (?,?,?,?,?,?);

28 / 35



□ TPS (Transactions Per Second) GraphSELECT * FROM t1 WHERE a = ? column a is indexed and partition key.

SELECT * FROM t1 WHERE b BETWEEN ? AND ? column b is indexed but not partitioned key.

INSERT INTO T1 VALUES (?,?,?,?,?,?)Auto commit

SELECT * FROM t1 WHERE b = ?column b is indexed but not partitioned key.

29 / 35



□ ART (Average Response Time) Graph -- The lower the better

30 /

SELECT * FROM t1 WHERE a = ? column a is indexed and partition key.

SELECT * FROM t1 WHERE b BETWEEN ? AND ? column b is indexed but not partitioned key.

INSERT INTO T1 VALUES (?,?,?,?,?,?)Auto commit

SELECT * FROM t1 WHERE b = ?column b is indexed but not partitioned key.

30 / 35



□ Test environment

Server nodes (10.34.64.49/50 …/58):

• CPU : Intel(R) Xeon(R) CPU E5645@

2.40GHz(12 core)

• Memory: 16G

• Network: 1000 Mbps

• OS: Center 5.5(64bit)

Configure:

• cubrid.conf:

data_buffer_pages=1,000,000

Table size: 100,000,000 rows (one

hundred million)

Data size: 88G (total size: 127G)

31 / 35



32 / 35


Pros and cons

□ Pros

Current application can use CUBRID Cluster easily

CUBRID Cluster can store more data or provide higher performance than CUBRID

CUBRID Cluster is easy to scale-out data size

CUBRID Cluster can save cost

Support transaction

□ Cons

Not support join

Performance is not good enough yet

• S2S communication may led network cost

• 2PC will write many logs led IO cost

33 / 35


Next Version plan

□ Tentative Work plan

Performance improvement

Support HA for each server node in CUBRID Cluster

Support Load balance (write to active server/read from standby server)

Support distributed partition by range/list

Support global user

Others : backup/restore DB

34 / 35


Appendix

□ Why select partition key is not fast enough? (back)

35 /

Broker

AP

Server #1

SECLET .. FROM t1Where a = 100

DB1

Server #2

DB1

Step4: fetch back

SCAN

Step2: Send remote scan request

Step3: do scan

SECLET .. FROM t1__p__p2Where a = 100p2 stored on server2 Broker

AP

Server #1

SECLET .. FROM t1Where a = 100

DB1

Server #2

DB1

No remote scan here

SCAN

Step1: Send request to server2 directly

Step2: do scan

SECLET .. FROM t1__p__p2Where a = 100p2 stored on server2

Step1: Send request to server1 (default server)

35 / 35


Appendix (cont.)

□ Why insert is not fast enough? (BACK)

Broker

AP

Server #1

INSERT t1 (a, …)VALUES (100, …..);

DB1

Server #2

DB1Write log 3 times

2 phase commit

a=100 should be stored on server2

COMMIT

Write log 2 times


Broker

AP

Server #1

DB1

Server #2

DB1

No 2PC here

Write log1 time

INSERT t1 (a, …)VALUES (100, …..);

a=100 should be stored on server2

COMMIT

Dirty Dirty

36 / 35

CUBRID Cluster Introduction

Technology

Transcript of CUBRID Cluster Introduction