The Architecture of CUBRID

26
The Architecture of CUBRID

description

This documents explains the architecture of CUBRID Database Management System.

Transcript of The Architecture of CUBRID

Page 1: The Architecture of CUBRID

The Architecture of CUBRID

Page 2: The Architecture of CUBRID

CONTENTS

1. Introduction ____________________________________________________________________ 3

1.1 Overall Architecture of the CUBRID System _______________________________________________ 5

1.2 Process Architecture ______________________________________________________________________ 6

1.2.1 CONNECTION CONFIGURATION ______________________________________________________ 7

2. Broker __________________________________________________________________________ 8

2.1 The cub_broker Process __________________________________________________________________ 8

2.2 The cub_cas Process _____________________________________________________________________ 8

3. Client and Server Modules _____________________________________________________ 10

3.1 Module Configuration ______________________________________________________________________ 11

3.1.1 TRANSACTION MANAGEMENT COMPONENT __________________________________________ 11

3.1.2 SERVER STORAGE MANAGEMENT COMPONENT ______________________________________ 13

3.1.3 CLIENT STORAGE MANAGEMENT COMPONENT _______________________________________ 14

3.1.4 OBJECT MANAGEMENT COMPONENT ________________________________________________ 15

3.1.5 CLIENT-SERVER COMMUNICATIONS __________________________________________________ 17

3.1.6 THREAD MANAGEMENT COMPONENT ________________________________________________ 18

3.1.7 QUERY PROCESSING _______________________________________________________________ 18

3.2 Detailed Description for the Modules ______________________________________________________ 19

3.2.1 TRANSACTION MANAGEMENT COMPONENT __________________________________________ 19

3.2.2 OBJECT MANAGEMENT COMPONENT ________________________________________________ 21

3.2.3 QUERY PROCESSING _______________________________________________________________ 22

Page 3: The Architecture of CUBRID

CUBRID is an object-relational database management system (DBMS) consisting of the Database Server, the Broker,

and the CUBRID Manager.

As the core component of the CUBRID Database Management System, the Database Server saves and manages

data in a multi-threaded client/server architecture. The Database Server processes the queries entered by users

and manages objects in the database. The CUBRID Database Server provides seamless transactions using locking

and logging methods even when multiple users use the database at the same time. It also supports database

backup and restore for the operation.

The Broker is a CUBRID-specific middleware that relays the communication between the Database Server and

external applications. It provides functions including connection pooling, monitoring, and log tracing and analysis.

The CUBRID Manager is a GUI tool that manages database and broker. It also provides the Query Editor, a tool

that allows users to execute SQL queries on the Database Server.

The basic configuration of CUBRID is shown in Figure 1 below.

1.Introduction

Page 4: The Architecture of CUBRID

1. Introduction

Figure 1. Basic Configuration of CUBRID

Page 5: The Architecture of CUBRID

1. Introduction

1.1 Overall Architecture of the CUBRID System

Figure 2. Overall Architecture of the CUBRID System

Figure 2 shows the overall architecture of the CUBRID system.

The CUBRID system follows the client/server model that allows multiple applications to access the same database

simultaneously. The client module (the Broker in Figure 2) and the server module (the Server in Figure 2) on separate

systems (computers) are connected through a network. Even when a broker and a server on the same system are

connected, the same architecture as above is configured because they are connected via socket IPC. A server

performs the requests from multiple clients in a single process/multi-threaded environment, and each server process

manages one database.

The client module analyzes SQL queries on the database from users or applications and executes them to the

optimization level. Then it generates a query plan tree and sends it to the server. And it receives the execution results

from the server by using the cursor navigation and delivers them to the users or applications. The client caches object

instances from the database to its memory to provide fast access to data by using the query execution results or

directly by users/applications. In addition, it caches locks as well as objects from the server for concurrency control.

The execution of triggers or methods specified by users or applications is also performed in the client module.

The server module receives and processes requests from the client module (e.g., object requests or query execution

requests from a query execution tree) and then returns the query execution results. The server can execute the

requests from multiple clients in a single process/multi-threaded environment. To support multiple client modules with

the appropriate number of threads, the server threads are allocated to each broker request, not to each broker. The

server performs input and output operations for database and log volume and provides a file access method to the

database volume in a file or page. In addition, it manages page buffer in a memory and uses a B+-tree index to

Page 6: The Architecture of CUBRID

1. Introduction

increase retrieval speed. The server also provides concurrency control, deadlock detection, and failover between

multiple transactions.

1.2 Process Architecture

Figure 3. Process Architecture of the CUBRID System

Figure 3 shows the process architecture of the CUBRID system. In the server host, there can be one master process

(cub_master) and more than one database server process (cub_server). Each client process (cub_cas) that exists in

multiple broker hosts connects to each single database server process.

The cub_broker process allocates cub_cas, passes a connection and manages cub_cas for a connect request from an

application. The cub_cas process executes database queries from the application.

Page 7: The Architecture of CUBRID

1. Introduction

1.2.1 Connection Configuration

The cub_cas process connects to the defined connection port number of the master process. The master process

checks whether the requested database server is running; the connection request is rejected if the server is not

running. If the requested database server is running, the master process passes the connected socket to the requested

server process. Then, the server process communicates with the client process (cub_cas) directly through the socket.

The database server process connects to the master process's port and then registers its server name (database

name) and establishes a UNIX Domain Socket (or Named Pipe) connection to the master process. In this connection,

the master process passes a socket descriptor to the client (cub_cas); the connection is maintained for server

shutdown and other future operations. After the connection between the server and client processes (cub_cas) is

established, the server process allocates threads for each client request and performs tasks.

Master Process (cub_master)

1. Checks whether other master process is running by connecting to cubrid_port_id

2. Switches to the demon process, opens a socket to the port defined as cubrid_port_id, and waits for the connection

between the client and the server.

3. Registers a server name and establishes a UNIX domain socket connection to the server process if the connection

is from the database server process.

4. Passes the connected socket number (socket descriptor) to the database server requested by the client to

establish a socket connection between the client and the server if it is connected from the client process (cub_cas).

Database Server Process (cub_server)

1. Connects to the designated port of the master process. If the connection fails, the connection attempt is aborted,

assuming that the master process is not running.

2. Registers its server name (database name) to the master process if the connection to the master process is

established. At this time, if a server with the same name already exists, the registration is rejected, and the server

is terminated.

3. Creates a UNIX Domain socket (or Named Pipe), sends a connection path (socket file path) to the master process

and terminates the socket connection to the designated port when the master process is connected.

4. Waits for task requests from the connected client. At this time, a connection relay of a new client from the master

process is processed, if any.

5. Accepts requests from the connected client and performs tasks by allocating threads.

Client Process (cub_cas)

1. Connects to the master process that exists on a remote or local server through the port defined as cubrid_port_id.

2. Sends the name of the database to connect when the connection to the master process is established and checks

whether the database server process is registered and running. At this time, the connection is rejected if there is no

corresponding server.

3. Receives response messages directly from the server because the master process passes the socket connection

between the client and the master process to the corresponding server process.

Page 8: The Architecture of CUBRID

The Broker is a middleware that relays the communication between the database server and applications. It consists of

cub_broker and cub_cas.

2.1 The cub_broker Process

The cub_broker process allocates cub_cas, passes a connection and manages cub_cas for a connection request from

an application. cub_broker has a multi-threaded architecture and consists of the following threads:

main

This thread creates other threads and manages the number of cub_cas processes. It increases or decreases the

number of cub_cas processes depending on the number of requests in the job queue.

receiver_thread

As a thread waiting for the accept() system call, this thread puts a connection request from an application into the

job queue.

dispatch_thread

This thread finds cub_cas available to allocate to the connection requests in the job queue and passes the

connection to cub_cas.

cas_monitor_thread

If cub_cas is abnormally terminated, this thread restarts cub_cas.

2.2 The cub_cas Process

The cub_cas process executes database queries from an application and has a single thread architecture. This

process connects to the database server when it receives a “connection” request from an application and calls a

function corresponding to the request from the application. After the connection with the application is terminated, this

process can receive a connection from another application. When disconnecting an application, the connection to the

database server is not terminated. If next application uses the same database as the current one, the existing

database connection is reused.

Depending on the application's connection status, cub_cas has four statuses: IDLE, BUSY, CLIENT WAIT, or CLOSE

WAIT.

- IDLE: No connection is made to an application.

- BUSY: A connection is made to an application, and the request from the application is being processed.

2.Broker

Page 9: The Architecture of CUBRID

2. Broker

- CLIENT WAIT: A request from an application is waited for, and a transaction is being processed.

- CLOSE WAIT: A request from an application is waited for but a transaction has been terminated. If the

connection between cub_cas and an application is disconnected in this status, the application attempts

reconnection.

The cub_cas process waits for the select() call after a connection to the application is established and processes each

function passed by the application. Main functions that respond to requests from an application are as follows:

fn_end_tran

This function performs commit/rollback. If KEEP_CONNECTION is set to off in the cubrid_broker.conf file, it

terminates the connection the application when a transaction is terminated; establishes a new connection when a

new transaction starts. If KEEP_CONNECTION is set to auto, the status of cub_cas changes to CLOSE_WAIT

when a transaction is terminated. In this case, if the application connected to cub_cas has not sent a new request,

and a new application has sent a "connection" request, the cub_broker process can select the cub_cas whose

status is CLOSE_WAIT to terminate the connection to the previous application and send a request to cub_cas

asking for the connection to a new application.

fn_prepare

This function processes a prepare request from an application. It compiles the queries, creates a handle for the

compiled query and sends it to the application. Then, the application sends an execution request by using the

created handle. After the queries are compiled, if they are the SELECT queries, meta information on columns is

extracted and sent to the application.

fn_execute

This function executes a prepared query statement. If the query statement is SELECT, it sends the query results

as the specified buffer size and sends the query execution results for other query statements. If JDBC RESULT

CACHE is in use and the executed query already exists in JDBC RESULT CACHE, this function determines

whether the stored query results can be reused. If they can be reused, the query results are not sent. Instead,

only a flag indicating reusability is sent to the JDBC.

fn_fetch

This function copies the query results of the SELECT statement as the specified buffer size and sends them to an

application.

Page 10: The Architecture of CUBRID

This chapter describes the components of the entire server (hereinafter, the server) and the native C API & other

modules (hereinafter, the client) in the Client Library of the Broker as shown in Figure 4.

Figure 4. Detailed Architecture of the CUBRID System

3. Client and Server Modules

Page 11: The Architecture of CUBRID

3. Client and Server Modules

3.1 Module Configuration

The CUBRID client and server modules consist of the following components:

Transaction Management Component

Handles system transactions across the client and server (including system failover).

Server Storage Management Component

Accesses and manages database and log volume on the server (including page buffering).

Client Storage Management Component

Allocates and manages a workspace for the object cache and access on the client.

Object Management Component

Defines a class object, creates and modifies an object, converts the object representation structure between the

disk and the memory.

Client-Server Communications

Manages the network communication between the client and the server.

Thread Management

Manages threads of a server process.

Query Processing

Executes query plans on the server, which are created by translating, analyzing and optimizing SQL statements

on the client.

The module configuration of each component is described in the following section.

3.1.1 Transaction Management Component

The Transaction Management Component consists of the modules in dark blue in Figure 5.

Page 12: The Architecture of CUBRID

3. Client and Server Modules

Figure 5. Module Configuration of Transaction Management Component

Object Locator

As a module passing object data between a workspace on the clients and the page buffer pool on the server, it

caches an object and acquires a lock to a workspace.

Transaction Manager

As a module performing transaction start, commit, and rollback, it initializes other modules (lock/log/recovery

manager) of Transaction Management Component. This module also supports commit, rollback, and savepoint

including 2PC (2-phase commit).

Lock Manager

As a module performing lock management based on the 2PL (2-phase locking) protocol, it supports a granularity

locking protocols.

Recovery Manager

Page 13: The Architecture of CUBRID

3. Client and Server Modules

As a module protecting database consistency from the system failure, it employs a failover method that uses

UNDO/REDO logging and the WAL (Write Ahead Logging) protocol. This module supports total rollback, partial

rollback (to savepoint), and nested top operation, and uses LSA (Log Sequence Address) and CLR

(Compensation Log Record), etc.

3.1.2 Server Storage Management Component

The Server Storage Management Component consists of the modules shown in Figure 6.

Figure 6. Module Configuration of Server Storage Management Component

I/O Manager

As a module performing I/O tasks for the disk volume (or volume file), it performs a volume mount/unmount

process and locks a volume. This module performs write synchronization for a log volume.

Page Buffer Management

As a module managing the page buffer in a virtual memory that is used for disk page buffering, it employs the

LRU page replacement algorithm and the FIX/UNFIX protocol to use page buffer. In addition, this module uses a

hash table to quickly retrieve a requested page in the buffer pool.

Disk Manager

It is a module managing the internal structure of the disk volume (or volume file). A volume consists of sectors,

and a sector is a group of continuous pages. Each volume consists of system area and user area. The bit

allocation map is used for page allocation in the volume.

Page 14: The Architecture of CUBRID

3. Client and Server Modules

File Manager

As a module helping access to a database only in a file and page regardless of internal structure of the volume

(volume, sector and page), it is used in a file structure such as B+-tree, heap, or hash. The File Manager module

keeps and manages information on the sector that is allocated to a file in a file header.

Slotted Page Manager

As a module inserting, deleting and updating records in a file page, it provides slot structure that indicates the

position (offset) of records in a page; it can move records in a page through a slot.

Overflow Page Manager

A module inserting, deleting and updating records with the size of over one page in an overflow page area. With

this module, you can treat a large size data atomically.

Object Heap Manager

It is a module inserting, deleting and, updating an object in a file through the heap structure. The instances

(records) of a class (table) are stored into an object heap file, and a unique OID (object identifier) is allocated to

each record. The OID consists of "Volume ID | + Page ID + Slot ID," and it is not reused except for a special case.

This OID expression is the same as disk addressing in the Disk Manager. That is, the OID indicates the physical

location of a disk where a record is stored.

Extendible Hash Manager

As a module providing the extendible hashing to access data quickly, it is used to retrieve class OIDs with a class

name.

B+-tree Manager

As a module providing an index file structure based on the prefix B+-tree, it inserts, deletes, and retrieves a key

for B+-tree.

Long Data Manager

As a module processing ad-hoc large objects such as multimedia data, it can modify part of the data.

3.1.3 Client Storage Management Component

The Client Storage Management Component consists of the modules shown in Figure 7.

Page 15: The Architecture of CUBRID

3. Client and Server Modules

Figure 7. Module Configuration of Client Storage Management Component

Workspace Manager

A module managing the database objects cached in the workspace of the client process. Through an object table

implemented as a hash, it converts a disk object identifier OID to a memory object pointer (MOP). The MOP has a

memory pointer that helps access to objects cached in the client memory.

Garbage Collector

A module collecting garbage for the client workspace. This module releases the memory that is allocated to MOPs

and cached objects.

Quick Fit Storage Allocator

A module allocating a memory to the workspace for an object.

3.1.4 Object Management Component

The Object Management Component consists of the modules in Figure 8.

Page 16: The Architecture of CUBRID

3. Client and Server Modules

Figure 8. Module Configuration of Object Management Component

Representation Manager

This is a module performing conversion between disk expression structure and memory expression structure of

an object. An object data is suitable to query execution in a disk and it has a structure which helps an application

access it in a memory. The Representation Manager does conversion between these two expression formats. It

also performs byte ordering during conversion.

Schema Manager

As a module defining and changing a class, it creates, modifies, or manages the inheritance of a column, method,

or class.

Object Access Manager

As a module creating, deleting, modifying, checking an object or calling a method, it is closely related to the

Schema Manager.

Dynamic Loader

A module providing a dynamic link to an application that is executing methods written in C.

Trigger Manager

A module implementing a trigger feature with a system object. This module is closely related to the Schema

Manager and Object Access Manager.

Page 17: The Architecture of CUBRID

3. Client and Server Modules

Authorization Manager

A module checking the authority of a database user. This module is implemented on top of the API provided by

the Object Access Manager.

Data Type and Domain

A module manipulating internal data structure (representation format) for data type and domain information. This

module caches the information about the used domain to a connection list and has a domain conversion matrix.

3.1.5 Client-Server Communications

Client-Server Communications consists of the modules in Figure 9.

Figure 9. Module Configuration of Client-Server Communications

Socket Manager

A module managing communications in the client, the server and the master process (cub_master). This module

manages the procedures of connection to the client or server through the master process.

Packet Manager

A module processing a packet that is used to exchange information between the client and the server. The packet

types include request packet, data packet, close packet, out-of-band packet, or error packet. The request packet

and data packet can communicate asynchronously by using a queue in the client and server.

Client-Server Interface

A module providing an interface to use Client-Server Communications in the system. This module processes an

exception that occurs during communications as well as out-of-band such as user interrupt, etc.

Page 18: The Architecture of CUBRID

3. Client and Server Modules

3.1.6 Thread Management Component

Thread Management Component manages multiple threads in the server process; it is implemented by using pthread.

This component detects a request from the client by using the select() system call and allocates a task to the threads

per each request. Similarly, the worker thread processing a request from the client waits for a task in the Job Queue

and wakes up when a task enters the process. After it processes the task, it waits for another task in the Job Queue.

There are also system threads that process only special system tasks as well as this worker thread.

Deadlock detection thread

This thread checks whether a deadlock occurs at a fixed interval or when there is a lock request, and it solves a

problem when there is a deadlock.

Checkpoint thread

This thread performs a checkpoint feature that flushes the data page, which is already committed at a fixed

interval but not reflected to the disk and cached in the page buffer. Performing a periodic checkpoint reduces the

restore time during failover.

OOB (out-of-band) thread

This thread receives the OOB signal and passes it to thread.

Page-flush thread

This thread periodically flushes the dirty pages in the page buffer to the disk. This improves system performance

by reducing flushing dirty pages to the disk during page replacement.

Log-flush thread

This thread flushes the log page to the log volume. It provides group and asynchronous commit methods by using

the log flush thread.

3.1.7 Query Processing

The Query Processing consists of the following modules.

Scanner/Parser

As a module translating queries (SQL) from users or applications, it creates a parse tree.

Semantic Checker

A module performing node typing, name resolution, semantic checking, or view translation, etc.

XASL Generator/Optimizer

A module creating XASL (eXtended Access Specification Language) tree which is a query execution plan and

performing query optimization by using schema information and database statistics. The XASL tree includes scan

information (heap scan, index scan, list file scan, set scan, and method scan), a value list (values required for

query results) and predicate. The query optimization employs cost-based optimization and rewrite optimization.

Query Manager

A server module executing a given XASL_tree from the client. This module consists of the Query File Manager

that stores the query's XASL plan and its results as well as the Query Evaluator that evaluates queries and

Page 19: The Architecture of CUBRID

3. Client and Server Modules

creates a result list file. This module interfaces with the Transaction Manager or Recovery Manager to approve or

cancel a transaction.

Cursor Manager

A module fetching data from the list file that is created as the retrieval results.

3.2 Detailed Description for the Modules

3.2.1 Transaction Management Component

A. Object Locator

The Object Locator is a module delivering object data between a workspace on the clients and the page buffer pool on

the server. The Object Locator provides simultaneous access, use, and failover for database objects by using the

Transaction Management Component's locking and restore algorithm.

The Object Locator is divided into Object Locator on the client, Object Locator on the server, and Object Locator on the

client/server. The Client Object Locator executes its tasks by using Workspace Manager, Representation Manager

(Transformation Manager), and Heap File Manager. The Authorization Manager, Schema Manager, Object Access

Manager and Query Parser (Scanner/Parser) use the functions of Client Object Locator. The Server Object Locator

executes tasks by using Object Heap Manager, Representation Manager (Transformation Manager), Lock Manager,

Catalog Manager, and B+-tree Manager. In the Client Object Locator, the functions of Server Object Locator module is

used for object fetch and flush.

The objects that are cached to the workspace of a client by the Object Locator maintains coherency with the objects in

a server by using cache coherency number. If the cache coherency number of an object, that is cached into the

workspace of a client, is not the same as the cache coherency number of an object that exists in the page buffer (or

disk) of a server, the cached object becomes invalid (invalidation). The Server Object Locator increases the cache

coherency number of an object whenever an object is flushed from a server and it is sent to a server.

Validation check for a cached object is performed when the object is first used by transaction. Because lock is also

cached (set up) when an object is cached, the validation of an object is effective while one transaction is being

executed. When a transaction requests an object, the Client Object Locator checks whether the object and its lock are

cached. If both the object and lock are cached, the transaction can use the cached objects in the workspace memory

much faster. If neither the object nor lock is cached, send a request to the Server Object Locator. The Server Object

Locator sets up lock that is requested for an object by using the Lock Manager. When lock is acquired, the cache

coherency number of an object in the workspace and the cache coherency number of an object that exists in the

database (page buffer or disk) of a server are compared. If these two values are different, a new object data from the

server is sent to the client and it replaces the old cached object.

When a transaction is terminated, the cached objects are flushed to a server. When a transaction is rolled back, the

objects are all de-cached. In addition, when a class object is invalidated (e.g., a schema is changed by a transaction of

another client), all the instance objects in the class are flushed/de-cached all together. And all the objects are flushed

to a server together with query execution requests because queries are executed in a server.

To reduce the communication amount between a client and a server, the Object Locator sends flush data together with

object fetch request packet or pre-fetches related class objects or other surrounding objects when caching objects.

Page 20: The Architecture of CUBRID

3. Client and Server Modules

The Server Object Locator fetches an object from database and updates it to the database upon the request of Client

Object Locator by using the Heap File Manager. In addition, it manages lock setting by using the Lock Manager.

B. Transaction Manager

The Transaction Manager is a module which does transaction start, approval, and rollback, etc. The Transaction

Manager calls the Object Locator to flush an object that is used for transaction, the Lock Manager to release a cached

lock, or the Log Manager (Recovery Manager) for transaction approval/rollback.

The Transaction Manager is divided into a client and a server. When an application requests transaction termination

(approval, rollback), the Client Transaction Manager flushes the objects (among the objects in the workspace) that are

changed during transaction execution to the page buffer of a server. (If it is rollback request, the changed objects are

not flushed to a server. Instead, they are immediately removed from the workspace.) Next, the Client Transaction

Manager requests approval/rollback to the Server Transaction Manager. In case of approval, the Server Transaction

Manager calls the Log Manager (Recovery Manager) executes postpone action to the database in a server and also

loose_end postpone action in a client. After that, it releases all the acquired locks and closes all the open cursors. In

case of rollback, the Log Manager (Recovery Manager) returns the tasks that are executed by transaction by using

UNDO log and releases all the acquired locks. When a transaction is approved or rolled back, the locks that are

cached by the Client Transaction Manager are all released.

It supports 2PC (2-phase commit) protocol for global transaction.

C. Lock Manager

The Lock Manager is a module that manages locks according to the 2PL (2 Phase Locking) protocol and Granularity

Locking protocol. The Lock Manager searches for a transaction identifier, calls the Log Manager (Recovery Manager)

to get the lock waiting time of a transaction, and calls the Server Transaction Manager to roll back a transaction to

handle deadlock. The Server Object Locator uses the Lock Manager to acquire and release a lock for an object and

the Log Manager uses the Lock Manager to release locks all together.

When accessing an instance object, lock setting is necessary for the class objects that define the all attributes of the

instance and also for the upper class objects that are inherited. In case of the schema change for a class object,

eXclusive lock must be set for the class and its lower classes.

In case of query execution, the instance of a class and the instance of its lower classes are all searched. In addition,

because a class object is a domain that defines the corresponding instance, the domain class and its lower classes are

all accessed. Therefore, set up shared lock for the class to search and its lower classes and also the domain class that

defines an instance and its lower classes during query execution.

To detect a deadlock, WFG (Waits-For-Graph) method is used. If WFG detects a deadlock, one of the involved

transactions is forcibly terminated by the system.

The Lock Manager manages Lock Table. The Lock Table is implemented with hash table for OID and access to the

table is set up as critical section to maintain consistency.

D. Recovery Manager

The Recovery Manager reflects the status of all the committed transactions to the database and does not reflect the

effect of transactions that are not committed when any fault to transaction, system, or media occurs. For this, the

Recovery Manager records a log and restores database from diverse faults based on the log. The CUBRID Recovery

Manager uses UNDO/REDO restore protocol and this protocol is based on the following rules:

Page 21: The Architecture of CUBRID

3. Client and Server Modules

UNDO Rule

Record data value before it is changed. It is assured the last committed value is recorded into a log before it is

overwritten by a value that is not yet committed.

REDO Rule

The values updated by a transaction are surely recorded into a log before the transaction is committed. That is,

the data value before committing is recorded into a log.

A log is a file in which data is appended in an arbitrary length. To implement a log file with infinite length, recent log

data is recorded into an active log and previous log data is archived into an archive log.

The UNDO/REDO logging is designed to achieve the maximum efficiency during general operation, rather than

database system fault restore time. The flush of data page can be avoided as much as possible during commit or

rollback due to the logging protocol. The data page is only written to a disk only when it is replaced by another page.

3.2.2 Object Management Component

The Object Management Component defines a table, creates or modifies an object, and formats an object in a disk or

memory.

A. Representation Manager

This is a module performing conversion between disk expression structure and memory expression structure of an

object. An object data is suitable to query execution in a disk and it has a structure which helps an application access it

in a memory. The Representation Manager does conversion between these two expression formats.

Figure 10. Disk Expression Format of an Object

Page 22: The Architecture of CUBRID

3. Client and Server Modules

The disk expression format of an object is shown in Figure 10. The class OID and Representation ID of an object

come first, and these are used to judge which format the object has. The following CHN (Cache Coherency Number) is

used to judge the validity of caches object. In the disk expression format, the columns (attributes) are divided into a

fixed length type column where all the values have the same length just like an integer and a variable length column

where all the values have different lengths just like a string. The fixed length columns are saved into a pre-defined

location, and the location of each column is obtained from the information that is managed by the Catalog Manager.

The location of the variable length column is obtained from the variable length column offset table which has location

information of each variable length column. The last entry of offset table indicates the end of an object. The offset table

is not saved for the object of a table which has no variable length column.

When an object is cached into a memory, the MOP indicates a memory block that has the columns of the object. The

fixed length column values are continuously saved into an object block and the values of a variable length column are

saved into a memory block that is separately allocated. The CHN is also included in the memory expression format.

The object locator compares this CHN value and the CHN value that is stored in a disk to judge the validity of an object.

If two CHN values are different, it means the object that is cached to the memory is not valid. Then, the object locator

de-caches the object and caches the content of a new object.

Figure 11. Memory Expression Format of an Object

The Representation Manager uses the Workspace Manager to receive a storage space for the memory expression of

an object and uses the Schema Manager to determine the size and architecture of an object.

When the CUBRID changes schema, it does not change the expression format of the records in the schema. Therefore,

if you find an object that is saved in the old expression format during the conversion process between two expression

formats, convert it to the recent expression format. At this time, use schema information for the recent expression

format and the old expression format. During expression format conversion process, convert the difference of hardware

architecture between the client equipment and the server equipment, e.g., the byte ordering difference.

3.2.3 Query Processing

Page 23: The Architecture of CUBRID

3. Client and Server Modules

Figure 12. The Procedures of Query Compile in a Client

A. Scanner/Parser

The parser keeps the data structure to create a parse tree during parsing process, the data structure to maintain the

created parse tree, and data structure to manage multiple SQL statements, and information about lexer.

B. Semantic Checker

If a parse tree is configured without an error, it means a query statement with correct syntax is input. Semantic

checking is a feature that checks whether the semantics of an input statement is valid. It performs the following tasks:

1. Name resolution and parse tree node type checking

Checks whether an existing table or column is used and infers the type of a column.

2. Semantic checking

Checks whether an operation that is not supported between types is used.

3. View translation

Converts the definition statement of a view.

Page 24: The Architecture of CUBRID

3. Client and Server Modules

C. XASL Generator/Optimizer

The query statement input by a user goes through parsing and semantic checking, and then it is converted into the

augmented parse tree where catalog information is listed. When query optimization is performed based on this

augmented parse tree, the XASL tree, i.e. action plan, is created as a result. The XASL tree is a tree where the most

optimized access sequence and method are specified for the tables to access during query execution. It consists of

action plans which has the lowest access path cost among many other possible plans. With a parse tree and catalog

statistics information, one XASL tree can be created as follows:

1. Classifying terms to configure search conditions in table units

A term becomes a search condition for one or more tables. When there is one table to which the term is applied,

the term is scan term (sarg). If there are two, the term is join term (edge). If there are three, the term is other term.

For the terms specified in the where clause of a parse tree, divide them into join terms or scan terms. Classify the

scan terms according to the table to which each term is applied.

2. Determining the most optimized access method to each table

For the scan terms that will be applied to an arbitrary table, calculate the selectivity of each scan term and select a

search method of a term whose selectivity is lowest as a table search method. That is, determine whether to use

sequential scan or index scan for a table. If the index scan is used, determine which index to use.

3. Calculating selectivity for each table

Calculate the selectivity of each table by using the selectivity of each scan term that is calculated in the step 2.

4. Determining access sequence among tables

To determine the access sequence among tables, list various access sequences and calculate access path cost

of each case. Select the execution sequence whose access cost is lowest as the final execution plan.

5. Creating XASL tree for the final execution plan

D. Query Manager

This is a server module that executes a XASL tree from a client. During Query Processing, a client sends a XASL tree

that is created through the XASL Generator/Optimizer module to a server. A query is executed when the server

receives and executes this XASL tree. Actually, it is undesirable, in terms of performance, to go through the XASL

Generator/Optimizer whenever there is a query of the same pattern, the CUBRID saves the XASL tree into the Query

Plan Cache and reuses it. In addition, when the same query is executed repeatedly, it saves the query result into the

Query Cache and returns the result without query execution next time.

Page 25: The Architecture of CUBRID

3. Client and Server Modules

Figure 13. Query Execution on the Server

The procedure of query processing through these components is shown in Figure 14.

Page 26: The Architecture of CUBRID

3. Client and Server Modules

Figure 14. Query Execution Steps