Balancing Replication and Partitioning in a Distributed Java Database

Traditional disk-oriented database architecture is showing its age.

In-memory databases offer significant gains in performance as all data is freely available. There is no need to page to and from disk.

But three issues have stunted their uptake: Address spaces only being large enough for a subset of a typical user’s data. The ‘one more bit’ problem and durability.

Distributed in-memory databases solve these three problems but at the price of loosing the single address space.

Snowflake Schemas allow us to mix Partitioning and Replication so joins never hit the wire.

This makes joins a problem. When data must be joined across multiple machines performance degradation is inevitable.

But this model only goes so far. “Connected Replication” takes us a step further allowing us to make the best possible use of replication.

The lay of the land:

The main architectural constructs in the database

industry

Shared Disk

0.000,000,000,000

μs ns ps ms

L1 Cache Ref

L2 Cache Ref

Main Memory Ref

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB Disk/Network

* L1 ref is about 2 clock cycles or 0.7ns. This is the time it takes light to travel 20cm

Distributed Cache

Taken from “OLTP Through the Looking Glass, and What We Found There” Harizopoulos et al

Regular Database Oracle, Sybase,

MySql

ODC

SN In-Memory

Exasol, VoltDB, Hana

DistributedCaching

Coherence, Gemfire, Gigaspaces

Shared Nothing

Teradata, Vertica, Greenplumb…

In-Memory Database

Times Ten, HSQL, KDB

Distribute Drop Disk

Distributed Architecture

Simplify the Contract.

Stick to RAM

450 processes

Messaging (Topic Based) as a system of record (persistence)

2TB of RAM Oracle

Coherence

Dat

a La

yer Transactions

Cashflows

Que

ry L

ayer

Mtms

Acc

ess

Laye

r Java client

API

Java client

API

Per

sist

ence

Lay

er

Indexing

Replication Partitioning

But your storage is limited by the memory on a node

Scalable storage, bandwidth and processing

Keys Fs-Fz Keys Xa-Yd

Trade

Party Trader Version 1

Trade


Trade


Trade


…and you need versioning to do MVCC

Trade Party Trader

Trade

Trade

Party

Party

Party

Trader

Trader

So better to use partitioning, spreading

data around the cluster.

Trade

Party Trader

Trade Party Trader

!This is what using Snowflake Schemas and

the Connected Replication pattern is all about!

Common Keys

Crosscutting Keys

Trade

Party Trader

Partitioned

Replicated

Valuation Legs

Valuations

Part Transaction Mapping

Cashflow Mapping

Party Alias

Transaction

Cashflows

Legs

Parties

Ledger Book

Source Book

Cost Centre

Product

Risk Organisation Unit

Business Unit

HCS Entity

Set of Books

0 37,500,000 75,000,000 112,500,000 150,000,000

Facts: =>Big, common keys

Dimensions =>Small, crosscutting Keys

Trades MTMs

Common Key

Coherence’s KeyAssociation

gives us this

Trade

Party Trader

Partitioned (

Replicated

Data Layer

Transactions

Cashflows

Query Layer

Mtms

Fact Storage (Partitioned)

Trade

Party Trader

Transactions

Cashflows

Dimensions (repliacte)

Mtms


Facts (distribute/

partition)

Valuation Legs

Valuations


Cashflow Mapping

Party Alias

Transaction

Cashflows

Legs

Parties

Ledger Book

Source Book

Cost Centre

Product


Business Unit

HCS Entity

Set of Books

0 37,500,000 75,000,000 112,500,000 150,000,000

Facts: =>Big =>Distribute

Dimensions =>Small => Replicate

We use a variant on a Snowflake Schema to

partition big stuff, that has the same key and replicate

small stuff that has crosscutting keys.

Replicate

Distribute

Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’

Transactions

Cashflows

Mtms

Partitioned Storage

LBs[]=getLedgerBooksFor(CC1)

SBs[]=getSourceBooksFor(LBs[])

So we have all the bottom level dimensions needed to query facts


Transactions

Cashflows

Mtms

Partitioned Storage




Get all Transactions and MTMs (cluster side join) for the passed Source Books


Transactions

Cashflows

Mtms

Partitioned Storage




Get all Transactions and MTMs (cluster side join) for the passed Source Books

Populate raw facts (Transactions) with dimension data before returning to client.


Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join!

So all the big stuff is held paritioned

And we can join without shipping keys

around and having intermediate results

Trade

Party Trader

Trade Party Trader

Trade


Trade


Trade


Trade


Trade Party Trader

Trade

Trade

Party

Party

Party

Trader

Trader

Valuation Legs

Valuations


Cashflow Mapping

Party Alias

Transaction

Cashflows

Legs

Parties

Ledger Book

Source Book

Cost Centre

Product


Business Unit

HCS Entity

Set of Books

0 125,000,000

Facts

Dimensions

This is a dimension •  It has a different

key to the Facts. •  And it’s BIG

Party Alias

Parties

Ledger Book

Source Book

Cost Centre

Product


Business Unit

HCS Entity

Set of Books

0 1,250,000 2,500,000 3,750,000 5,000,000

Party Alias

Parties

Ledger Book

Source Book

Cost Centre

Product


Business Unit

HCS Entity

Set of Books

20 1,250,015 2,500,010 3,750,005 5,000,000

So we only replicate ‘Connected’ or ‘Used’

dimensions

Data Layer

Dimension Caches (Replicated)

Transactions

Cashflows

Processing Layer

Mtms


As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

Trade

Party Alias

Source Book

Ccy

Data Layer (All Normalised)

Query Layer (With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

Trade

Party Alias

Source Book

Ccy



Trade

Party Alias

Source Book

Ccy

Party

LedgerBook



‘Connected Replication’

A simple pattern which recurses through the foreign keys in the domain model, ensuring only ‘Connected’ dimensions are replicated

Java client

API Java schema Java ‘Stored Procedures’

and ‘Triggers’

Partitioned Storage

Balancing Replication and Partitioning in a Distributed Java Database

Technology

Transcript of Balancing Replication and Partitioning in a Distributed Java Database