Advanced databases ben stopford

Post on 27-May-2015

3.615 views 5 download

Tags:

Transcript of Advanced databases ben stopford

Data Storage for Extreme Use Cases The Lay of the Land and a Peek at ODC

Ben Stopford RBS

How fast is a HashMap lookup

~20 ns

Thatrsquos how long it takes light to travel a room

How fast is a database lookup

~20 ms

Thatrsquos how long it takes light to go to Australia and

back

3 times

Computers really are very fast

The problem is wersquore quite good at writing software that

slows them down

Question

Is it fair to compare the performance of a Database with a HashMap

Of course nothellipbull Physical Diversity A database call

involves both Network and Diskbull Functional Diversity Databases provide a

wealth of additional features including persistence transactions consistency etc

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

How fast is a HashMap lookup

~20 ns

Thatrsquos how long it takes light to travel a room

How fast is a database lookup

~20 ms

Thatrsquos how long it takes light to go to Australia and

back

3 times

Computers really are very fast

The problem is wersquore quite good at writing software that

slows them down

Question

Is it fair to compare the performance of a Database with a HashMap

Of course nothellipbull Physical Diversity A database call

involves both Network and Diskbull Functional Diversity Databases provide a

wealth of additional features including persistence transactions consistency etc

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Thatrsquos how long it takes light to travel a room

How fast is a database lookup

~20 ms

Thatrsquos how long it takes light to go to Australia and

back

3 times

Computers really are very fast

The problem is wersquore quite good at writing software that

slows them down

Question

Is it fair to compare the performance of a Database with a HashMap

Of course nothellipbull Physical Diversity A database call

involves both Network and Diskbull Functional Diversity Databases provide a

wealth of additional features including persistence transactions consistency etc

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

How fast is a database lookup

~20 ms

Thatrsquos how long it takes light to go to Australia and

back

3 times

Computers really are very fast

The problem is wersquore quite good at writing software that

slows them down

Question

Is it fair to compare the performance of a Database with a HashMap

Of course nothellipbull Physical Diversity A database call

involves both Network and Diskbull Functional Diversity Databases provide a

wealth of additional features including persistence transactions consistency etc

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Thatrsquos how long it takes light to go to Australia and

back

3 times

Computers really are very fast

The problem is wersquore quite good at writing software that

slows them down

Question

Is it fair to compare the performance of a Database with a HashMap

Of course nothellipbull Physical Diversity A database call

involves both Network and Diskbull Functional Diversity Databases provide a

wealth of additional features including persistence transactions consistency etc

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

3 times

Computers really are very fast

The problem is wersquore quite good at writing software that

slows them down

Question

Is it fair to compare the performance of a Database with a HashMap

Of course nothellipbull Physical Diversity A database call

involves both Network and Diskbull Functional Diversity Databases provide a

wealth of additional features including persistence transactions consistency etc

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Computers really are very fast

The problem is wersquore quite good at writing software that

slows them down

Question

Is it fair to compare the performance of a Database with a HashMap

Of course nothellipbull Physical Diversity A database call

involves both Network and Diskbull Functional Diversity Databases provide a

wealth of additional features including persistence transactions consistency etc

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

The problem is wersquore quite good at writing software that

slows them down

Question

Is it fair to compare the performance of a Database with a HashMap

Of course nothellipbull Physical Diversity A database call

involves both Network and Diskbull Functional Diversity Databases provide a

wealth of additional features including persistence transactions consistency etc

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Question

Is it fair to compare the performance of a Database with a HashMap

Of course nothellipbull Physical Diversity A database call

involves both Network and Diskbull Functional Diversity Databases provide a

wealth of additional features including persistence transactions consistency etc

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Of course nothellipbull Physical Diversity A database call

involves both Network and Diskbull Functional Diversity Databases provide a

wealth of additional features including persistence transactions consistency etc

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End