Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes...

78
1 Copyright 2007 MySQL AB The World’s Most Popular Open Source Database Architecture of NDB Mikael Ronström Senior MySQL Architect MySQL AB

Transcript of Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes...

Page 1: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

1Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Architecture ofNDB

Mikael RonströmSenior MySQL Architect

MySQL AB

Page 2: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

2Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

MySQL Cluster Architecture

Scalability on three levels:1.   # Applications3. # MySQL servers4. # Database nodes

Application Application Application

MySQLServer

NDB Kernel(Database nodes)

MySQLServer

MySQLServer

DB DB

DBDB

MySQL Cluster

Page 3: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

3Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

MySQL Cluster Overview

• Main memory– Complemented with logging to disk

• Clustered– Database distributed over many nodes– Synchronous replication between nodes– Database automatically partitioned over the nodes– Fail­over capability in case of node failure

• Update in one MySQL server, immediately see result in another

• Support for high update loads (as well as very high read loads)

Page 4: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

4Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

MySQL Cluster Value

• Bringing clustered data management to a broader range of customers– Affordable– Ease of use– Upgrade path from current MySQL installations

• Making high­availability main­stream– On­going standardization efforts assume off­the­shelf 

components– New database technology using main­memory storage with 

replication gives persistency and short fail­over times

Page 5: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

5Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

MySQL Cluster Features (1)

• Support for mixed main memory and disk­based columns (indexed columns always in main memory)

• Transactions• Synchronous Replication• Auto­synch at restart of NDB node• Recovery from checkpoints and logs at cluster crash• Subscription to row changes• Indexes (Unique hash and T­tree ordered)

Page 6: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

6Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

MySQL Cluster Features (2)

• On­line Backup• On­line Add Index• On­line Drop Index• On­line Add Column• On­line SW Upgrade• Asynchronous replication between clusters• Conflict Resolution in Multi­Master Replication

Page 7: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

7Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

History

• Requirement collection, prototypes and Research 1991­1996• Draws from Ericsson’s extensive experience in building

reliable solutions for the Telecom/IP industry• Development start 1996 H2• Version 0.1 demo august 1997 (primary key lookups and updates)• Version 0.2 february 1998 (transaction support)• Version 0.3 october 1998 (Cluster crash support)• 1998: Prototype of Web Cache Server using Java NDB API• Version 1.0 april 2001 (Perl DBI,scan, ..)• Version 1.1 august 2001 (First production release)• Version 1.41 may 2002 (Node Recovery, Production release)• Version 2.11 may 2003 (ODBC, On­line Backup, Unique Index)• Version 2.12.3 february 2004 (production release for B2)• Version 3.4.5 14th April 2004 (MySQL integration, Ordered Index, On­

line SW Upgrade) put in production by B2

Page 8: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

8Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

NDB API

MySQL Cluster ArchitectureApplication

ManagementClient

MGMAPI

MySQLServer

NDB Kernel(Database nodes)

MGM Server(Management nodes)

Application

NDB API

Application

MySQLServer

Application

Application

NDB API

Page 9: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

9Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Node group 1 Node group 2

Node 4Node 3Node 2Node 1

F1 F1

F3F3

F2

F4

F2

F4

Logical configuration

2 copies of data

Fx – primary replica Fx – secondary replica

$$ValAccNoPnr

Table 1

F1

F2

F4

 Horizontal fragmentation    of Table 1 (4 fragments)

• Fragments distributed    on nodes

NDB Cluster : 4 node configuration on 2 dual processor machines

Dual CPUDual CPU

Physical configuration

Node 1

Node 3 Node 4

Node 2

TCP/IP or SCI

F3

Data distribution

Page 10: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

10Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Synchronous Replication: Low failover time

TC

Primary

Backup

Backup

Prepare

Prepare

Prepare Commit

Commit

Commit

Example showing transactionwith two operations using 

three replicas

messages  =  2  x  operations x (replicas + 1)

Primary

Backup

BackupPrepare Commit

1. Prepare F1 1. Prepare F22. Commit F1 2. Commit F2

Page 11: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

11Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Node group 1 Node group 2

Node 4Node 3Node 2Node 1

F1 F1

F3F3

F2

F4

F2

F4

Logical configuration

On­line Backup & Restore

Check pointing &log files(Data files,UNDO logREDO log)

System Restart

Back­up files(Meta­data,Data,Change Log)

Restore

Transactions

Page 12: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

12Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Failure detection: Heartbeats, lost connections

DBNode 1

DBNode 3

DBNode 2

DBNode 4

Nodes organizedin a logical circle

Heartbeat messagessent to next node in circle

All nodes musthave the same view

of which nodes are alive

Page 13: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

13Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Node Recovery

Running Node(Primary)

Restarting Node(Backup)

Copiedfragments

Copying

TC coordinates fragment operations

If there are twofragment copies,

then both are part of transaction

Example with two replicas

Update, insert, and deleteRead operations only read from one replica

Page 14: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

14Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Access methods for NDB tables

• 1) Primary key access• 2) Full table scan (parallel)• 3) Unique Index Key access• 4) Parallel Range Scan (on ordered index)• Many Primary Key and Unique Index Key 

accesses can be performed in parallel (in the same or different transactions)

Page 15: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

15Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Primary Key Access

• Define Table, Operation Type• Call equal on all primary key attributes• Columns to get/set• Support for very simple interpreter functions in 

update/delete (e.g. AMOUNT = AMOUNT + 10)• All tables have Primary Key (hidden if no primary 

key was specified in CREATE TABLE)• Primary Key or part of it used as distribution key• Linear hashing variant used locally for mapping 

to data page and row locks

Page 16: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

16Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Full table scan

• Define Table and type of scan• Define columns to get• Support for very simple filters (e.g. AMOUNT < 

10)• Support for take over of locked row to updating 

primary key access• Scan is performed in parallel on many 

fragments

Page 17: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

17Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Unique Key Index Access

• Specify Index Table and Table and Operation Type

• Call equal on unique index attributes• Rest as Primary Key access

Page 18: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

18Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Parallel Range Scan

• Specify Ordered Index name and Table name and scan type

• Specify lower and upper bound (or equal) of range scan

• Rest is the same as full table scan

Page 19: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

19Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Unique Hash Index

CREATE TABLE SERVICE

(INTEGER SID PRIMARY KEY,

 VARCHAR IPADDR, INTEGER PORT, UNIQUE(IPADDR, PORT));

SID IPADDR PORT17 213.114.20.229 4000

24 213.114.20.218 4007

2 213.114.20.235 4000

13 213.114.20.207 4010

IPADDR    PORT  NDB$PK213.114.20.235 4000   2

213.114.20.229 4000   17

213.114.20.207 4010   13

213.114.20.218 4007   24

Page 20: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

20Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Ordered Index

CREATE INDEX SINDX ON SERVICE(IPADDR,PORT);

213.114.20.207 4010 213.114.20.235 4000

213.114.20.218 4007 213.114.20.229 4000

SID IPADDR PORT17 213.114.20.229 4000

24 213.114.20.218 4007

2 213.114.20.235 4000

13 213.114.20.207 4010

Page 21: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

21Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Index Support

• Index maintenance through internal triggers (insert, delete, update) on primary table­ Insert on primary table (insert into index)­ Delete from primary table (delete from index)­ Update of primary table (delete from and insert into index)

• Explicit usage through NDB API­ Unique hash index accessed in similar manner as primary key access­ Ordered index accessed in similar manner as scan

Page 22: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

22Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Index Support

• Implicit usage through SQL   ­ Unique hash index chosen if primary key is not 

bound and unique hash index is bound­ Ordered index is scanned if a range scan is 

needed­ Query optimization can re­order query plans and use indexes

Page 23: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

23Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Indexes, Implementation Overview

• Unique hash index­ Stored as ordinary table­ Data is stored in index (copies of attributes)­ Distributed differently than primary table

• Ordered indexes­ Stored as T­trees in each node­ Data is not stored in index (direct reference to attributes)­ Distributed in the same way as the primary table

Page 24: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

24Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Fragmentation of Primary Table

PORTIPADDRSID

SERVICE

F1

F2

F4

 Horizontal fragmentation    of SERVICE table (4 fragments)

• Fragments distributed    on nodes

NDB Cluster : 4 node configuration on 2 dual processor machines

Node group 1 Node group 2

Node 4Node 3Node 2Node 1

F1 F1

F4F4

F2

F3

F2

F3

Logical configurationSun E220Sun E220

Physical configuration

Node 1

Node 3 Node 4

Node 2

TCP/IP or SCI

F3

2 copies of data

Fx – primary replica Fx – secondary replica

Page 25: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

25Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Unique Hash Index, Fragmentation of Index Table

Node group 1 Node group 2

Node 4Node 3Node 2Node 1

FI2

FI3

FI2

FI3

FI1

FI4

FI1

FI4

F1

F4

F1

F4

NDB$PKPORTIPADDR

IP_ADDR_1

FI1

FI2

FI4

 Horizontal fragmentation    of IP_ADDR_1 table

• Fragments distributed    on nodes FI3

F2

F3

F2

F3

Page 26: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

26Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Unique Hash Index, Two­step Access

Node group 1 Node group 2

Node 4Node 3Node 2Node 1

FI2

FI3

FI2

FI3

FI1

FI4

FI1

FI4

F1

F4

F1

F4

F2

F3

F2

F3

TC

Step 1: Index table lookup

Step 2: Operation on primary table

TC: Transaction Coordinator

Page 27: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

27Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Unique Hash Index Maintenance

Node group 1 Node group 2

Node 4Node 3Node 2Node 1

FI2

FI3

FI2

FI3

FI1

FI4

FI1

FI4

F1

F4

F1

F4

F2

F3

F2

F3

TC

Step 1: Update on Primary Table

Step 2: Delete old index reference

Insert new reference

TC: Transaction Coordinator

Page 28: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

28Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Ordered Index, Parallel Range Scan

TC

Parallel Scan

Node group 1 Node group 2

Node 4Node 3Node 2Node 1

F1

F4

F1

F4

F2

F3

F2

F3

TC: Transaction Coordinator

Page 29: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

29Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Ordered Index Maintenance

TC

Update on Primary Table

Node group 1 Node group 2

Node 4Node 3Node 2Node 1

F1

F4

F1

F4

F2

F3

F2

F3

TC: Transaction Coordinator1) Delete index entry (at commit)

2) Insert index entry (at update)

Page 30: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

30Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

• All the MySQL API’s

• MGM API (C API for cluster management)

• NDB API (NDB table access API, C++)

MySQL Cluster API’s

Page 31: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

31Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

• Create Table, Drop Table, Create Index, Drop Index (only to be used from NDB storage handler)

• Read, Update, Delete, Insert and Write (= Replace)

• Parallel operations and transactions through synchronous interface

•Parallel operations/transactions through asynchronous interface

• Load Balancing towards NDB nodes when starting new transaction

•Possible to place transaction coordinator where data of transaction resides

• Always uses a live NDB node when starting a new transaction

• Adaptive Send algorithm for optimized performance

• Interpreted programs for scan filtering

• Event (detached trigger) support on operation and column level

NDB API: Highly optimized native C++ API

Page 32: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

32Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

NDB Kernel(Database nodes)

Application

Looking at performance

Five synchronousinsert transactions(10 x TCP/IP time)

Five inserts in onesynchronous transaction(2 x TCP/IP time)

NDB Kernel(Database nodes)

Application

NDB Kernel(Database nodes)

Application

Five asynchronousinsert transactions(2 x TCP/IP time)

Page 33: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

33Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

When use which API in MySQL Cluster

• Most applications will be fine with MySQL API’s

• When performance is really critical use the synchronous NDB API interface, fairly easy to use

⇒Around 3 times the performance• If performance needs to be even better 

and the technical competence is there use asynch NDB API interface

⇒Another 2 times better performance

Page 34: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

34Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

• CPU’s are getting faster as always

• Multiple cores per chip

• Multiple chip per board, 2 common

• Clusters of low­end servers

⇒4 dimensions of development

•MySQL Cluster uses ALL of these developments simultaneously by automatically parallelising the low­level database operations

Cluster Development

Page 35: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

35Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

MySQL Cluster Future Features

• On­line Add Node• Multi­threaded NDB kernel nodes• Mixed­endian Clusters

Page 36: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

36Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

MySQL Cluster RT solutionon Dual­Core computer

Memory Optimized Architecture

Connection ThreadsWatch Dog thread

FileSystem  threads

Rest

MainThread

SuperSockets

Read/Write

CPU 0 CPU 1

Page 37: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

37Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

MySQL Cluster RT solution onQuad­Core computer using

4 data nodesCPU optimized architecture

Connection ThreadsWatch Dog thread

FileSystem  threads

CPU 0

SuperSockets

Read/Write

MainThread

Connection ThreadsWatch Dog thread

FileSystem  threads

CPU 1

SuperSockets

Read/Write

MainThread

Connection ThreadsWatch Dog thread

FileSystem  threads

CPU 2

SuperSockets

Read/Write

MainThread

Connection ThreadsWatch Dog thread

FileSystem  threads

CPU 3

SuperSockets

Read/Write

MainThread

Page 38: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

38Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Minimal Cluster, 2 data nodes, 2 computersDBT2 with 2 Mysql servers

0

5000

10000

15000

20000

25000

30000

1 2 4 8 16 32Parallel activity

Transactions per minute

eth

sci

eth + rt

sci + rt

Low load Heavy load

Improvement is due to latency

Improvement is due to efficiency

Response time 2 MySQL severs

0

5

1015

20

25

3035

40

45

1 2 4 8 16 32Parallel activity

ms

Response time eth  ms

Response time sci  ms

Response time eth + rt msResponse time sci + rt ms

Page 39: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

39Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

CPU­optimized architectureDistribution aware (8 data nodes)

CPU­optimized architecture Distribution aware (8 data nodes)

0

20 000

40 000

60 000

80 000

100 000

120 000

1 2 4 8 16 32 64 128 256Parallel activity

Transactions per minute

ethsci

eth+rtsci + rt

Low load Heavy load

Improvement is due to latency

Improvements compared to ethernet CPU­optimized architecture Distribution aware (8 data nodes)

­20

0

20

40

60

80

100

1 2 4 8 16 32 64 128 256 Parallel activity

%

Improvment scivs eth

Improvmenteth+rt vs eth

Improvmentsci+rt vs eth

Page 40: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

 

1Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Architecture ofNDB

Mikael RonströmSenior MySQL Architect

MySQL AB

Page 41: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

 

2Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

MySQL Cluster Architecture

Scalability on three levels:1.   # Applications3. # MySQL servers4. # Database nodes

Application Application Application

MySQLServer

NDB Kernel(Database nodes)

MySQLServer

MySQLServer

DB DB

DBDB

MySQL Cluster

Page 42: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

 

3Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

MySQL Cluster Overview

• Main memory– Complemented with logging to disk

• Clustered– Database distributed over many nodes– Synchronous replication between nodes– Database automatically partitioned over the nodes– Fail­over capability in case of node failure

• Update in one MySQL server, immediately see result in another

• Support for high update loads (as well as very high read loads)

Page 43: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

 

4Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

MySQL Cluster Value

• Bringing clustered data management to a broader range of customers– Affordable– Ease of use– Upgrade path from current MySQL installations

• Making high­availability main­stream– On­going standardization efforts assume off­the­shelf 

components– New database technology using main­memory storage with 

replication gives persistency and short fail­over times

Page 44: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

 

5Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

MySQL Cluster Features (1)

• Support for mixed main memory and disk­based columns (indexed columns always in main memory)

• Transactions• Synchronous Replication• Auto­synch at restart of NDB node• Recovery from checkpoints and logs at cluster crash• Subscription to row changes• Indexes (Unique hash and T­tree ordered)

Page 45: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

 

6Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

MySQL Cluster Features (2)

• On­line Backup• On­line Add Index• On­line Drop Index• On­line Add Column• On­line SW Upgrade• Asynchronous replication between clusters• Conflict Resolution in Multi­Master Replication

Page 46: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

7Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

History

• Requirement collection, prototypes and Research 1991­1996• Draws from Ericsson’s extensive experience in building

reliable solutions for the Telecom/IP industry• Development start 1996 H2• Version 0.1 demo august 1997 (primary key lookups and updates)• Version 0.2 february 1998 (transaction support)• Version 0.3 october 1998 (Cluster crash support)• 1998: Prototype of Web Cache Server using Java NDB API• Version 1.0 april 2001 (Perl DBI,scan, ..)• Version 1.1 august 2001 (First production release)• Version 1.41 may 2002 (Node Recovery, Production release)• Version 2.11 may 2003 (ODBC, On­line Backup, Unique Index)• Version 2.12.3 february 2004 (production release for B2)• Version 3.4.5 14th April 2004 (MySQL integration, Ordered Index, On­

line SW Upgrade) put in production by B2

Page 47: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

8Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

NDB API

MySQL Cluster ArchitectureApplication

ManagementClient

MGMAPI

MySQLServer

NDB Kernel(Database nodes)

MGM Server(Management nodes)

Application

NDB API

Application

MySQLServer

Application

Application

NDB API

Page 48: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

 

9Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Node group 1 Node group 2

Node 4Node 3Node 2Node 1

F1 F1

F3F3

F2

F4

F2

F4

Logical configuration

2 copies of data

Fx – primary replica Fx – secondary replica

$$ValAccNoPnr

Table 1

F1

F2

F4

 Horizontal fragmentation    of Table 1 (4 fragments)

• Fragments distributed    on nodes

NDB Cluster : 4 node configuration on 2 dual processor machines

Dual CPUDual CPU

Physical configuration

Node 1

Node 3 Node 4

Node 2

TCP/IP or SCI

F3

Data distribution

Page 49: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

 

10Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Synchronous Replication: Low failover time

TC

Primary

Backup

Backup

Prepare

Prepare

Prepare Commit

Commit

Commit

Example showing transactionwith two operations using 

three replicas

messages  =  2  x  operations x (replicas + 1)

Primary

Backup

BackupPrepare Commit

1. Prepare F1 1. Prepare F22. Commit F1 2. Commit F2

Page 50: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

11 

   

11Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Node group 1 Node group 2

Node 4Node 3Node 2Node 1

F1 F1

F3F3

F2

F4

F2

F4

Logical configuration

On­line Backup & Restore

Check pointing &log files(Data files,UNDO logREDO log)

System Restart

Back­up files(Meta­data,Data,Change Log)

Restore

Transactions

Page 51: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

 

12Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Failure detection: Heartbeats, lost connections

DBNode 1

DBNode 3

DBNode 2

DBNode 4

Nodes organizedin a logical circle

Heartbeat messagessent to next node in circle

All nodes musthave the same view

of which nodes are alive

Page 52: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

 

13Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Node Recovery

Running Node(Primary)

Restarting Node(Backup)

Copiedfragments

Copying

TC coordinates fragment operations

If there are twofragment copies,

then both are part of transaction

Example with two replicas

Update, insert, and deleteRead operations only read from one replica

Page 53: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

14 

   

14Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Access methods for NDB tables

• 1) Primary key access• 2) Full table scan (parallel)• 3) Unique Index Key access• 4) Parallel Range Scan (on ordered index)• Many Primary Key and Unique Index Key 

accesses can be performed in parallel (in the same or different transactions)

Page 54: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

15 

   

15Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Primary Key Access

• Define Table, Operation Type• Call equal on all primary key attributes• Columns to get/set• Support for very simple interpreter functions in 

update/delete (e.g. AMOUNT = AMOUNT + 10)• All tables have Primary Key (hidden if no primary 

key was specified in CREATE TABLE)• Primary Key or part of it used as distribution key• Linear hashing variant used locally for mapping 

to data page and row locks

Page 55: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

16 

   

16Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Full table scan

• Define Table and type of scan• Define columns to get• Support for very simple filters (e.g. AMOUNT < 

10)• Support for take over of locked row to updating 

primary key access• Scan is performed in parallel on many 

fragments

Page 56: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

17 

   

17Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Unique Key Index Access

• Specify Index Table and Table and Operation Type

• Call equal on unique index attributes• Rest as Primary Key access

Page 57: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

18 

   

18Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Parallel Range Scan

• Specify Ordered Index name and Table name and scan type

• Specify lower and upper bound (or equal) of range scan

• Rest is the same as full table scan

Page 58: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

19 

   

19Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Unique Hash Index

CREATE TABLE SERVICE

(INTEGER SID PRIMARY KEY,

 VARCHAR IPADDR, INTEGER PORT, UNIQUE(IPADDR, PORT));

SID IPADDR PORT17 213.114.20.229 4000

24 213.114.20.218 4007

2 213.114.20.235 4000

13 213.114.20.207 4010

IPADDR    PORT  NDB$PK213.114.20.235 4000   2

213.114.20.229 4000   17

213.114.20.207 4010   13

213.114.20.218 4007   24

Page 59: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

20 

   

20Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Ordered Index

CREATE INDEX SINDX ON SERVICE(IPADDR,PORT);

213.114.20.207 4010 213.114.20.235 4000

213.114.20.218 4007 213.114.20.229 4000

SID IPADDR PORT17 213.114.20.229 4000

24 213.114.20.218 4007

2 213.114.20.235 4000

13 213.114.20.207 4010

Page 60: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

21 

   

21Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Index Support

• Index maintenance through internal triggers (insert, delete, update) on primary table­ Insert on primary table (insert into index)­ Delete from primary table (delete from index)­ Update of primary table (delete from and insert into index)

• Explicit usage through NDB API­ Unique hash index accessed in similar manner as primary key access­ Ordered index accessed in similar manner as scan

Page 61: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

22 

   

22Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Index Support

• Implicit usage through SQL   ­ Unique hash index chosen if primary key is not 

bound and unique hash index is bound­ Ordered index is scanned if a range scan is 

needed­ Query optimization can re­order query plans and use indexes

Page 62: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

23 

   

23Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Indexes, Implementation Overview

• Unique hash index­ Stored as ordinary table­ Data is stored in index (copies of attributes)­ Distributed differently than primary table

• Ordered indexes­ Stored as T­trees in each node­ Data is not stored in index (direct reference to attributes)­ Distributed in the same way as the primary table

Page 63: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

24 

   

24Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Fragmentation of Primary Table

PORTIPADDRSID

SERVICE

F1

F2

F4

 Horizontal fragmentation    of SERVICE table (4 fragments)

• Fragments distributed    on nodes

NDB Cluster : 4 node configuration on 2 dual processor machines

Node group 1 Node group 2

Node 4Node 3Node 2Node 1

F1 F1

F4F4

F2

F3

F2

F3

Logical configurationSun E220Sun E220

Physical configuration

Node 1

Node 3 Node 4

Node 2

TCP/IP or SCI

F3

2 copies of data

Fx – primary replica Fx – secondary replica

Page 64: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

25 

   

25Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Unique Hash Index, Fragmentation of Index Table

Node group 1 Node group 2

Node 4Node 3Node 2Node 1

FI2

FI3

FI2

FI3

FI1

FI4

FI1

FI4

F1

F4

F1

F4

NDB$PKPORTIPADDR

IP_ADDR_1

FI1

FI2

FI4

 Horizontal fragmentation    of IP_ADDR_1 table

• Fragments distributed    on nodes FI3

F2

F3

F2

F3

Page 65: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

26 

   

26Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Unique Hash Index, Two­step Access

Node group 1 Node group 2

Node 4Node 3Node 2Node 1

FI2

FI3

FI2

FI3

FI1

FI4

FI1

FI4

F1

F4

F1

F4

F2

F3

F2

F3

TC

Step 1: Index table lookup

Step 2: Operation on primary table

TC: Transaction Coordinator

Page 66: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

27 

   

27Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Unique Hash Index Maintenance

Node group 1 Node group 2

Node 4Node 3Node 2Node 1

FI2

FI3

FI2

FI3

FI1

FI4

FI1

FI4

F1

F4

F1

F4

F2

F3

F2

F3

TC

Step 1: Update on Primary Table

Step 2: Delete old index reference

Insert new reference

TC: Transaction Coordinator

Page 67: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

28 

   

28Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Ordered Index, Parallel Range Scan

TC

Parallel Scan

Node group 1 Node group 2

Node 4Node 3Node 2Node 1

F1

F4

F1

F4

F2

F3

F2

F3

TC: Transaction Coordinator

Page 68: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

29 

   

29Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Ordered Index Maintenance

TC

Update on Primary Table

Node group 1 Node group 2

Node 4Node 3Node 2Node 1

F1

F4

F1

F4

F2

F3

F2

F3

TC: Transaction Coordinator1) Delete index entry (at commit)

2) Insert index entry (at update)

Page 69: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

 

30Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

• All the MySQL API’s

• MGM API (C API for cluster management)

• NDB API (NDB table access API, C++)

MySQL Cluster API’s

Page 70: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

 

31Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

• Create Table, Drop Table, Create Index, Drop Index (only to be used from NDB storage handler)

• Read, Update, Delete, Insert and Write (= Replace)

• Parallel operations and transactions through synchronous interface

•Parallel operations/transactions through asynchronous interface

• Load Balancing towards NDB nodes when starting new transaction

•Possible to place transaction coordinator where data of transaction resides

• Always uses a live NDB node when starting a new transaction

• Adaptive Send algorithm for optimized performance

• Interpreted programs for scan filtering

• Event (detached trigger) support on operation and column level

NDB API: Highly optimized native C++ API

Page 71: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

32 

   

32Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

NDB Kernel(Database nodes)

Application

Looking at performance

Five synchronousinsert transactions(10 x TCP/IP time)

Five inserts in onesynchronous transaction(2 x TCP/IP time)

NDB Kernel(Database nodes)

Application

NDB Kernel(Database nodes)

Application

Five asynchronousinsert transactions(2 x TCP/IP time)

Page 72: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

 

33Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

When use which API in MySQL Cluster

• Most applications will be fine with MySQL API’s

• When performance is really critical use the synchronous NDB API interface, fairly easy to use

⇒Around 3 times the performance• If performance needs to be even better 

and the technical competence is there use asynch NDB API interface

⇒Another 2 times better performance

Page 73: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

 

34Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

• CPU’s are getting faster as always

• Multiple cores per chip

• Multiple chip per board, 2 common

• Clusters of low­end servers

⇒4 dimensions of development

•MySQL Cluster uses ALL of these developments simultaneously by automatically parallelising the low­level database operations

Cluster Development

Page 74: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

 

35Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

MySQL Cluster Future Features

• On­line Add Node• Multi­threaded NDB kernel nodes• Mixed­endian Clusters

Page 75: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

 

36Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

MySQL Cluster RT solutionon Dual­Core computer

Memory Optimized Architecture

Connection ThreadsWatch Dog thread

FileSystem  threads

Rest

MainThread

SuperSockets

Read/Write

CPU 0 CPU 1

Page 76: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

 

37Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

MySQL Cluster RT solution onQuad­Core computer using

4 data nodesCPU optimized architecture

Connection ThreadsWatch Dog thread

FileSystem  threads

CPU 0

SuperSockets

Read/Write

MainThread

Connection ThreadsWatch Dog thread

FileSystem  threads

CPU 1

SuperSockets

Read/Write

MainThread

Connection ThreadsWatch Dog thread

FileSystem  threads

CPU 2

SuperSockets

Read/Write

MainThread

Connection ThreadsWatch Dog thread

FileSystem  threads

CPU 3

SuperSockets

Read/Write

MainThread

Page 77: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

 

38Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

Minimal Cluster, 2 data nodes, 2 computersDBT2 with 2 Mysql servers

0

5000

10000

15000

20000

25000

30000

1 2 4 8 16 32Parallel activity

Transactions per minute

eth

sci

eth + rt

sci + rt

Low load Heavy load

Improvement is due to latency

Improvement is due to efficiency

Response time 2 MySQL severs

0

5

1015

20

25

3035

40

45

1 2 4 8 16 32Parallel activity

ms

Response time eth  ms

Response time sci  ms

Response time eth + rt msResponse time sci + rt ms

Page 78: Architecture of NDB - MySQL · Distributed differently than primary table • Ordered indexes Stored as Ttrees in each node Data is not stored in index (direct reference to attributes)

   

 

39Copyright 2007 MySQL AB The World’s Most Popular Open Source Database

CPU­optimized architectureDistribution aware (8 data nodes)

CPU­optimized architecture Distribution aware (8 data nodes)

0

20 000

40 000

60 000

80 000

100 000

120 000

1 2 4 8 16 32 64 128 256Parallel activity

Transactions per minute

ethsci

eth+rtsci + rt

Low load Heavy load

Improvement is due to latency

Improvements compared to ethernet CPU­optimized architecture Distribution aware (8 data nodes)

­20

0

20

40

60

80

100

1 2 4 8 16 32 64 128 256 Parallel activity

%

Improvment scivs eth

Improvmenteth+rt vs eth

Improvmentsci+rt vs eth