Balancing Replication and Partitioning in a Distributed Java Database

99

description

This talk, presented at JavaOne 2011, describes the ODC, a distributed, in-memory database built in Java that holds objects in a normalized form in a way that alleviates the traditional degradation in performance associated with joins in shared-nothing architectures. The presentation describes the two patterns that lie at the core of this model. The first is an adaptation of the Star Schema model used to hold data either replicated or partitioned data, depending on whether the data is a fact or a dimension. In the second pattern, the data store tracks arcs on the object graph to ensure that only the minimum amount of data is replicated. Through these mechanisms, almost any join can be performed across the various entities stored in the grid, without the need for key shipping or iterative wire calls.

Transcript of Balancing Replication and Partitioning in a Distributed Java Database

Page 1: Balancing Replication and Partitioning in a Distributed Java Database
Page 2: Balancing Replication and Partitioning in a Distributed Java Database

Traditional disk-oriented database architecture is showing its age.

In-memory databases offer significant gains in performance as all data is freely available. There is no need to page to and from disk.

But three issues have stunted their uptake: Address spaces only being large enough for a subset of a typical user’s data. The ‘one more bit’ problem and durability.

Distributed in-memory databases solve these three problems but at the price of loosing the single address space.

Snowflake Schemas allow us to mix Partitioning and Replication so joins never hit the wire.

This makes joins a problem. When data must be joined across multiple machines performance degradation is inevitable.

But this model only goes so far. “Connected Replication” takes us a step further allowing us to make the best possible use of replication.

Page 3: Balancing Replication and Partitioning in a Distributed Java Database
Page 4: Balancing Replication and Partitioning in a Distributed Java Database
Page 5: Balancing Replication and Partitioning in a Distributed Java Database

The lay of the land:

The main architectural constructs in the database

industry

Page 6: Balancing Replication and Partitioning in a Distributed Java Database
Page 7: Balancing Replication and Partitioning in a Distributed Java Database

Shared Disk

Page 8: Balancing Replication and Partitioning in a Distributed Java Database
Page 9: Balancing Replication and Partitioning in a Distributed Java Database
Page 10: Balancing Replication and Partitioning in a Distributed Java Database

0.000,000,000,000

μs ns ps ms

L1 Cache Ref

L2 Cache Ref

Main Memory Ref

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB Disk/Network

* L1 ref is about 2 clock cycles or 0.7ns. This is the time it takes light to travel 20cm

Page 11: Balancing Replication and Partitioning in a Distributed Java Database
Page 12: Balancing Replication and Partitioning in a Distributed Java Database
Page 13: Balancing Replication and Partitioning in a Distributed Java Database
Page 14: Balancing Replication and Partitioning in a Distributed Java Database
Page 15: Balancing Replication and Partitioning in a Distributed Java Database
Page 16: Balancing Replication and Partitioning in a Distributed Java Database
Page 17: Balancing Replication and Partitioning in a Distributed Java Database
Page 18: Balancing Replication and Partitioning in a Distributed Java Database
Page 19: Balancing Replication and Partitioning in a Distributed Java Database

Distributed Cache

Page 20: Balancing Replication and Partitioning in a Distributed Java Database
Page 21: Balancing Replication and Partitioning in a Distributed Java Database

Taken from “OLTP Through the Looking Glass, and What We Found There” Harizopoulos et al

Page 22: Balancing Replication and Partitioning in a Distributed Java Database
Page 23: Balancing Replication and Partitioning in a Distributed Java Database

Regular Database Oracle, Sybase,

MySql

ODC

SN In-Memory

Exasol, VoltDB, Hana

DistributedCaching

Coherence, Gemfire, Gigaspaces

Shared Nothing

Teradata, Vertica, Greenplumb…

In-Memory Database

Times Ten, HSQL, KDB

Distribute Drop Disk

Page 24: Balancing Replication and Partitioning in a Distributed Java Database

Distributed Architecture

Simplify the Contract.

Stick to RAM

Page 25: Balancing Replication and Partitioning in a Distributed Java Database

450 processes

Messaging (Topic Based) as a system of record (persistence)

2TB of RAM Oracle

Coherence

Page 26: Balancing Replication and Partitioning in a Distributed Java Database

Dat

a La

yer Transactions

Cashflows

Que

ry L

ayer

Mtms

Acc

ess

Laye

r Java client

API

Java client

API

Per

sist

ence

Lay

er

Page 27: Balancing Replication and Partitioning in a Distributed Java Database

Indexing

Replication Partitioning

Page 28: Balancing Replication and Partitioning in a Distributed Java Database

But your storage is limited by the memory on a node

Page 29: Balancing Replication and Partitioning in a Distributed Java Database

Scalable storage, bandwidth and processing

Keys Fs-Fz Keys Xa-Yd

Page 30: Balancing Replication and Partitioning in a Distributed Java Database
Page 31: Balancing Replication and Partitioning in a Distributed Java Database
Page 32: Balancing Replication and Partitioning in a Distributed Java Database
Page 33: Balancing Replication and Partitioning in a Distributed Java Database

Trade

Party Trader Version 1

Trade

Party Trader Version 2

Trade

Party Trader Version 3

Trade

Party Trader Version 4

…and you need versioning to do MVCC

Page 34: Balancing Replication and Partitioning in a Distributed Java Database

Trade Party Trader

Trade

Trade

Party

Party

Party

Trader

Trader

Page 35: Balancing Replication and Partitioning in a Distributed Java Database

So better to use partitioning, spreading

data around the cluster.

Page 36: Balancing Replication and Partitioning in a Distributed Java Database

Trade

Party Trader

Trade Party Trader

Page 37: Balancing Replication and Partitioning in a Distributed Java Database

Trade

Party Trader

Trade Party Trader

Page 38: Balancing Replication and Partitioning in a Distributed Java Database
Page 39: Balancing Replication and Partitioning in a Distributed Java Database
Page 40: Balancing Replication and Partitioning in a Distributed Java Database
Page 41: Balancing Replication and Partitioning in a Distributed Java Database

!This is what using Snowflake Schemas and

the Connected Replication pattern is all about!

Page 42: Balancing Replication and Partitioning in a Distributed Java Database
Page 43: Balancing Replication and Partitioning in a Distributed Java Database
Page 44: Balancing Replication and Partitioning in a Distributed Java Database

Common Keys

Crosscutting Keys

Page 45: Balancing Replication and Partitioning in a Distributed Java Database

Trade

Party Trader

Partitioned

Replicated

Page 46: Balancing Replication and Partitioning in a Distributed Java Database
Page 47: Balancing Replication and Partitioning in a Distributed Java Database
Page 48: Balancing Replication and Partitioning in a Distributed Java Database
Page 49: Balancing Replication and Partitioning in a Distributed Java Database
Page 50: Balancing Replication and Partitioning in a Distributed Java Database
Page 51: Balancing Replication and Partitioning in a Distributed Java Database
Page 52: Balancing Replication and Partitioning in a Distributed Java Database

Valuation Legs

Valuations

Part Transaction Mapping

Cashflow Mapping

Party Alias

Transaction

Cashflows

Legs

Parties

Ledger Book

Source Book

Cost Centre

Product

Risk Organisation Unit

Business Unit

HCS Entity

Set of Books

0 37,500,000 75,000,000 112,500,000 150,000,000

Facts: =>Big, common keys

Dimensions =>Small, crosscutting Keys

Page 53: Balancing Replication and Partitioning in a Distributed Java Database
Page 54: Balancing Replication and Partitioning in a Distributed Java Database

Trades MTMs

Common Key

Coherence’s KeyAssociation

gives us this

Page 55: Balancing Replication and Partitioning in a Distributed Java Database

Trade

Party Trader

Partitioned (

Replicated

Page 56: Balancing Replication and Partitioning in a Distributed Java Database

Data Layer

Transactions

Cashflows

Query Layer

Mtms

Fact Storage (Partitioned)

Trade

Party Trader

Page 57: Balancing Replication and Partitioning in a Distributed Java Database

Transactions

Cashflows

Dimensions (repliacte)

Mtms

Fact Storage (Partitioned)

Facts (distribute/

partition)

Page 58: Balancing Replication and Partitioning in a Distributed Java Database

Valuation Legs

Valuations

Part Transaction Mapping

Cashflow Mapping

Party Alias

Transaction

Cashflows

Legs

Parties

Ledger Book

Source Book

Cost Centre

Product

Risk Organisation Unit

Business Unit

HCS Entity

Set of Books

0 37,500,000 75,000,000 112,500,000 150,000,000

Facts: =>Big =>Distribute

Dimensions =>Small => Replicate

Page 59: Balancing Replication and Partitioning in a Distributed Java Database

We use a variant on a Snowflake Schema to

partition big stuff, that has the same key and replicate

small stuff that has crosscutting keys.

Page 60: Balancing Replication and Partitioning in a Distributed Java Database

Replicate

Distribute

Page 61: Balancing Replication and Partitioning in a Distributed Java Database

Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’

Page 62: Balancing Replication and Partitioning in a Distributed Java Database
Page 63: Balancing Replication and Partitioning in a Distributed Java Database

Transactions

Cashflows

Mtms

Partitioned Storage

LBs[]=getLedgerBooksFor(CC1)

SBs[]=getSourceBooksFor(LBs[])

So we have all the bottom level dimensions needed to query facts

Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’

Page 64: Balancing Replication and Partitioning in a Distributed Java Database

Transactions

Cashflows

Mtms

Partitioned Storage

LBs[]=getLedgerBooksFor(CC1)

SBs[]=getSourceBooksFor(LBs[])

So we have all the bottom level dimensions needed to query facts

Get all Transactions and MTMs (cluster side join) for the passed Source Books

Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’

Page 65: Balancing Replication and Partitioning in a Distributed Java Database
Page 66: Balancing Replication and Partitioning in a Distributed Java Database

Transactions

Cashflows

Mtms

Partitioned Storage

LBs[]=getLedgerBooksFor(CC1)

SBs[]=getSourceBooksFor(LBs[])

So we have all the bottom level dimensions needed to query facts

Get all Transactions and MTMs (cluster side join) for the passed Source Books

Populate raw facts (Transactions) with dimension data before returning to client.

Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’

Page 67: Balancing Replication and Partitioning in a Distributed Java Database
Page 68: Balancing Replication and Partitioning in a Distributed Java Database

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join!

Page 69: Balancing Replication and Partitioning in a Distributed Java Database

So all the big stuff is held paritioned

And we can join without shipping keys

around and having intermediate results

Page 70: Balancing Replication and Partitioning in a Distributed Java Database

Trade

Party Trader

Trade Party Trader

Page 71: Balancing Replication and Partitioning in a Distributed Java Database

Trade

Party Trader Version 1

Trade

Party Trader Version 2

Trade

Party Trader Version 3

Trade

Party Trader Version 4

Page 72: Balancing Replication and Partitioning in a Distributed Java Database

Trade Party Trader

Trade

Trade

Party

Party

Party

Trader

Trader

Page 73: Balancing Replication and Partitioning in a Distributed Java Database
Page 74: Balancing Replication and Partitioning in a Distributed Java Database
Page 75: Balancing Replication and Partitioning in a Distributed Java Database
Page 76: Balancing Replication and Partitioning in a Distributed Java Database
Page 77: Balancing Replication and Partitioning in a Distributed Java Database
Page 78: Balancing Replication and Partitioning in a Distributed Java Database

Valuation Legs

Valuations

Part Transaction Mapping

Cashflow Mapping

Party Alias

Transaction

Cashflows

Legs

Parties

Ledger Book

Source Book

Cost Centre

Product

Risk Organisation Unit

Business Unit

HCS Entity

Set of Books

0 125,000,000

Facts

Dimensions

This is a dimension •  It has a different

key to the Facts. •  And it’s BIG

Page 79: Balancing Replication and Partitioning in a Distributed Java Database
Page 80: Balancing Replication and Partitioning in a Distributed Java Database
Page 81: Balancing Replication and Partitioning in a Distributed Java Database
Page 82: Balancing Replication and Partitioning in a Distributed Java Database
Page 83: Balancing Replication and Partitioning in a Distributed Java Database

Party Alias

Parties

Ledger Book

Source Book

Cost Centre

Product

Risk Organisation Unit

Business Unit

HCS Entity

Set of Books

0 1,250,000 2,500,000 3,750,000 5,000,000

Page 84: Balancing Replication and Partitioning in a Distributed Java Database

Party Alias

Parties

Ledger Book

Source Book

Cost Centre

Product

Risk Organisation Unit

Business Unit

HCS Entity

Set of Books

20 1,250,015 2,500,010 3,750,005 5,000,000

Page 85: Balancing Replication and Partitioning in a Distributed Java Database
Page 86: Balancing Replication and Partitioning in a Distributed Java Database

So we only replicate ‘Connected’ or ‘Used’

dimensions

Page 87: Balancing Replication and Partitioning in a Distributed Java Database

Data Layer

Dimension Caches (Replicated)

Transactions

Cashflows

Processing Layer

Mtms

Fact Storage (Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

Page 88: Balancing Replication and Partitioning in a Distributed Java Database
Page 89: Balancing Replication and Partitioning in a Distributed Java Database

Trade

Party Alias

Source Book

Ccy

Data Layer (All Normalised)

Query Layer (With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

Page 90: Balancing Replication and Partitioning in a Distributed Java Database

Trade

Party Alias

Source Book

Ccy

Data Layer (All Normalised)

Query Layer (With connected dimension Caches)

Page 91: Balancing Replication and Partitioning in a Distributed Java Database

Trade

Party Alias

Source Book

Ccy

Party

LedgerBook

Data Layer (All Normalised)

Query Layer (With connected dimension Caches)

Page 92: Balancing Replication and Partitioning in a Distributed Java Database

‘Connected Replication’

A simple pattern which recurses through the foreign keys in the domain model, ensuring only ‘Connected’ dimensions are replicated

Page 93: Balancing Replication and Partitioning in a Distributed Java Database
Page 94: Balancing Replication and Partitioning in a Distributed Java Database

Java client

API Java schema Java ‘Stored Procedures’

and ‘Triggers’

Page 95: Balancing Replication and Partitioning in a Distributed Java Database
Page 96: Balancing Replication and Partitioning in a Distributed Java Database
Page 97: Balancing Replication and Partitioning in a Distributed Java Database

Partitioned Storage

Page 98: Balancing Replication and Partitioning in a Distributed Java Database
Page 99: Balancing Replication and Partitioning in a Distributed Java Database