Download - Cassandra at arkivum

Transcript
Page 1: Cassandra at arkivum

Cassandra at Arkivum

Richard Lowe, Principal Engineer

Arkivum

[email protected]

Page 2: Cassandra at arkivum

About Arkivum

• We offer a safe, secure archive service for digital data

• We use data archiving expertise to keep data for the long-term: for

years, decades or forever

• Our service allows our customers to meet their compliance needs

and asset retention goals whilst focusing on their core business

© Arkivum Limited, 2012

2

Page 3: Cassandra at arkivum

Our architecture

© Arkivum Limited, 2012

• Gateway appliance is installed at customer site running our software,

talking across WAN using secure VPN to our software in our DCs

• File data is encrypted and stored on variety of storage media,

including SSD, hard disk and tape

• Focus is on maintaining long term data integrity, not low latency or

high availability

3

Page 4: Cassandra at arkivum

Legacy design

• Original code used an SQL database

• Our knowledge was biased towards RDBMS

• Normalization, JDBC, ACID, mature platform

• The software design assumed SQL

• Indexes and ad-hoc queries gave basic search functionality for

relatively little extra effort

© Arkivum Limited, 2012

4

Page 5: Cassandra at arkivum

Relational model of a file system CREATE TABLE files (

file_id VARCHAR NOT NULL PRIMARY KEY,

parent_id VARCHAR NOT NULL,

name VARCHAR NOT NULL,

size BIGINT DEFAULT 0,

created_date DATETIME DEFAULT CURRENT_TIMESTAMP,

modified_date DATETIME DEFAULT CURRENT_TIMESTAMP,

owner_uid INT DEFAULT 0,

owner_gid INT DEFAULT 0,

file_mode INT DEFAULT 493,

file_attr INT DEFAULT 0,

UNIQUE(parent_id, name)

);

© Arkivum Limited, 2012

5

Page 6: Cassandra at arkivum

Relational model of a file system Get a file by id SELECT * FROM files WHERE

file_id = 'f90b3e92-0e96-482f-b4e5-f1ca071f26d6';

List all files in a particular directory SELECT * FROM files WHERE

parent_id = 'e98eaaaa-07a6-4ffa-bd21-f3975529718b';

List all files modified in April 2010 and sort by size SELECT * FROM files WHERE

modified_date > '2010-03-31'

AND modified_date < '2010-05-01'

ORDER BY bytesize DESC;

© Arkivum Limited, 2012

6

Page 7: Cassandra at arkivum

Why Cassandra?

• Scalability

• Meets our need to scale to billions of records

• Designed for high-availability, high-throughput environments

• Replication

• Data safety is paramount to us

• Cassandra replication is a really strong feature

• Stability

• Well supported and used worldwide in high-profile, high-end

production systems

© Arkivum Limited, 2012

7

Page 8: Cassandra at arkivum

Cassandra model of a file system

Approach 1: Pretend we're using a relational database

• Use column families as if they're tables

• Use CQL because it's like SQL

• Create secondary indexes for everything in case we want to query

on it later

© Arkivum Limited, 2012

"parent_id" "name" "size" "modified" "accessed" "gid" "uid" "mode"

file_id UUID UTF8 Long Long Long Long Long Long

Files CF

8

Page 9: Cassandra at arkivum

Cassandra model of a file system

Approach 1 doesn't work

• Column families are not tables

• CQL looks like SQL, but isn't SELECT * FROM Files WHERE modified > '2010-03-31';

• Secondary indexes aren't cheap

• Can't sort based on column values, only on column names

© Arkivum Limited, 2012

9

Page 10: Cassandra at arkivum

Cassandra model of a file system

Approach 2: Use composite types and blobs

• Serialize file record and store as single object instead of multiple

values

• Use actual values as part of composite column name, so we can

search and sort based on them

© Arkivum Limited, 2012

(name, size, mtime, atime, gid, uid, mode)

(parent_id, file_id) file_blob

Files CF

10

Page 11: Cassandra at arkivum

Cassandra model of a file system

Approach 2 doesn't work either

• Need to know all the values for a composite to query based on it -

otherwise it means a range query, which is expensive file_exists = len(list(files_cf.get_range(

start = CompositeType(MIN_UUID, file_id),

finish = CompositeType(MAX_UUID, file_id),

row_count = 1))) == 1

• Sorting compares the entire composite, not each field [CompositeType('apples', 6), CompositeType('bananas', 2),

CompositeType('oranges', 5), CompositeType('pears', 4)]

© Arkivum Limited, 2012

11

Page 12: Cassandra at arkivum

Cassandra model of a file system

Approach 3: De-normalize

• Look at the most common queries and optimize for those

• Most lookups should require just a single get or slice query

• Speed vs. space: do we really care if a record is stored twice?

© Arkivum Limited, 2012

"file"

file_id file_blob

name

parent_id file_blob

Files CF Directories CF

12

Page 13: Cassandra at arkivum

Cassandra model of a file system

Approach 3 works

Get a file by id file = unpackFile(

files_cf.get(key=file_id, columns=['file']))

List all files in a particular directory files = unpackFiles(list(

directories_cf.get(key=directory_id)))

© Arkivum Limited, 2012

13

Page 14: Cassandra at arkivum

Lessons learned

• CQL isn't necessarily the easiest or best interface

• Break the golden rule

• Composites are useful under limited circumstances

• Avoid wide rows, they can lead to pain

• Should focus on queries that are most important

• Post-processing or Map/Reduce can be used to meet needs of less

common queries

© Arkivum Limited, 2012

14

Page 15: Cassandra at arkivum

Cassandra and network usage

10Mbit connection, replicating to 2 nodes

© Arkivum Limited, 2012

15

Page 16: Cassandra at arkivum

Cassandra and network usage

So how can it be used on a slow WAN?

• Tune down the message and packet size rpc_send_buff_size_in_bytes

rpc_recv_buff_size_in_bytes

thrift_framed_transport_size_in_mb

thrift_max_message_length_in_mb

• Be prepared for higher failure rates when things get busy rpc_timeout_in_ms

• Use an additional cache layer to reduce network I/O

© Arkivum Limited, 2012

16

Page 17: Cassandra at arkivum

Cassandra and network usage

© Arkivum Limited, 2012

10MBit connection, replicating to 2 nodes, after tuning

17

Page 18: Cassandra at arkivum

Cassandra and network usage

© Arkivum Limited, 2012

Cassandra replication

is better than DIY

alternative

18

Page 19: Cassandra at arkivum

Configuring Cassandra is key

Cassandra has lots of configuration options.

Taking time to understand and tweak them is worth the effort. Leaving

them as default probably won't give the best results.

Determine custom policies for how often to compact, repair, scrub,

etc. as these depend on the profile of the data being stored.

© Arkivum Limited, 2012

19

Page 20: Cassandra at arkivum

Future work

Continuing to scale our systems to cope with growing load and data

volumes

Adding additional search capabilities

Applying analytics to better understand how people are using our

service to store petabytes of data

© Arkivum Limited, 2012

20

Page 21: Cassandra at arkivum

Summary

• Arkivum provides a guaranteed service for long-term data archive

• We've transitioned our data model from RDBMS to Cassandra

• Our Cassandra deployment is multi-DC, multi-site across WAN

• Future tasks include improving search and using analytics

21

Page 22: Cassandra at arkivum

Questions?

[email protected]

www.arkivum.com

Page 23: Cassandra at arkivum

Cassandra cheat sheet

© Arkivum Limited, 2012

23