Database Scalability - The Shard Conflict

10
Database Scalability: The Shard Conflict July 2014

description

This presentation tackles a particularly challenging situation that often occurs when creating a distributed relational database. In this presentation you will learn: - What a ‘shard conflict’ is - How to identify ‘shard conflicts’ - How to resolve ‘shard conflicts’ in a distributed database - How ‘shard conflicts’ affect query processing

Transcript of Database Scalability - The Shard Conflict

Database Scalability: The Shard Conflict

July 2014

2

The Database Scalability: The Shard Conflict

This presentation tackles a particularly challenging situation that often occurs when

creating a distributed database.

In this presentation you will learn: • What a ‘shard conflict’ is• How to identify ‘shard conflicts’ • How to resolve ‘shard conflicts’ in a distributed database• How ‘shard conflicts’ affect query processing

3

Traditional Databases vs. Distributed Databases

Traditional Monolithic DBMade up of tables of data that are

related to one another

Modern Distributed DBData distribution is necessary for

scalability

All of the data is located in one place and is easily accessible

Information is spread across various servers (instances)

The data relationship is stored deep in the database and can be easily analyzed and queried using conventional methods

Related data can be distributed into different partitions, or shards, making

related query requests difficult to process

4

So, What Is a‘Shard Conflict’?

At ScaleBase, we have coined the term ‘shard conflict’ to describe a situation where:

• A given statement cannot be executed as is, unchanged, on all (or one) partitions and cannot be relied upon to yield a truly correct result.

Let’s take a look at the following examples…

5

Identifying the Conflict

Example #1

Choosing ‘id’ as the shard key presents a shard conflict, because there is no guarantee that all employees are in the same shard as their corresponding departments.

6

Resolving the Conflict

Example #2

The Method• Choose

‘department_id’ as the ‘Employee Table’shard key

The Outcome:• The join query was

optimized as a result of all department-related data being stored in the same partition

• No cross-joins exist between partitions

• Statements can now safely be executed on all partitions

7

Wait a Minute...There’s Still a Conflict

‘Select e.first_name, e.last_name, m.first_name, m.last_name from employee e join employee m on e.manager_id=m.id’

Join the ‘Employee Table’ together with itself to find a manager there is no guarantee they are in the same shard.

The employee tables are not capable of being sharded by both ‘id’ and ‘manager_id’ at the same time.

8

‘Shard Conflict’ Effects on Query Processing

• It is clear from the examples that when dealing with a foreign key and two tables, a common key can be utilized to resolve certain (but not all) conflicts

• Distributed data can become quite complex if not handled correctly

• It’s the kind of problem that is not always obvious, and can yield incorrect results, unnoticed

9

ScaleBase Can Help

ScaleBase is a modern, distributed MySQL database management system. It is optimized for the cloud and deploys in minutes to enable you to scale out to an unlimited number of users, data and transactions. It is a horizontally scalable database cluster built on MySQL that dynamically optimizes workloads and availability by logically distributing data across public, private and geo-distributed clouds.

Contact Us [email protected]

or Download free software

ScaleBase Softwarehttp://www.scalebase.com/software/

Use your relational aDBA skills and get NoSQL capabilities

10

Start Using ScaleBase Today

Check out ScaleBase’s software

• ScaleBase on Amazon

• ScaleBase on Rackspace