NoSQL replacement for SQLite (for Beatstream) - TUTtjm/seminars/nosql2012/sqlite-to-nosql.pdf ·...

81
NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

Transcript of NoSQL replacement for SQLite (for Beatstream) - TUTtjm/seminars/nosql2012/sqlite-to-nosql.pdf ·...

NoSQL replacement for SQLite

(for Beatstream)

Antti-Jussi Kovalainen

Seminar OHJ-1860: NoSQL databases

Background

Inspiration:

postgresapp.com

demo.beatstream.fi (modern desktop browsers without Flash block)

Backend

• Very light

– Access & modify data

– Relay commands to Last.fm

• Currently Ruby on Rails

• So, we’ll focus on the backend

My Super Complex Data Model

Playlists Songs

Copy the song row

User

s

Relation?

Data

Users

• Few users

• Not many fields: username, email, password, lastfm_key

• Read when user logs in

• Write rarely

• Because users have their own playlists

Songs

• Huge list of objects (array)

• Read after user logs in

• Write rarely

• Basically just meta-data about a song: title, artist,

album, tracknum, path, etc…

Playlists

• Possibly Huge lists of objects (arrays)

• Read a good amount during a day

• Write a lot (probably)

• Owned by a user

• Contains songs

– Also need to track song’s position on the playlist

SQLite

Why I chose SQLite

• Easy-to-use, simple, familiar

• Ready-to-use on new Rails project

• Simple data model get a simple DBMS?

• Can easily implement CRUD, playlist sorting, etc.

• Great for rapid prototyping

• Doesn’t require separate server installation

Why Replace SQLite?

Why Replace?

1. Fear of Bad Concurrency

– Multiple users + SQLite = Bad memories

• Writing playlists takes time

• Writing the songs list takes time

• SQLite locks up or corrupts data

– Sadness

2. Try something new

– Schemaless == even better for prototyping?

Sidenote

At The Moment

• Moved to JSON files

– Songs and Playlists are in JSON files

– Users are in SQLite

• SQLite was just getting in the way

– SQLite <--> SQL result <--> JSON

– Keep It Simple

• But this feels kinda icky

songs.json

Over 9000 objects

Back to regular programme

Features We Want • Standalone / embeddable / portable

– Can embed into application, invisible to Beatstream server user or admin

– No separate server installation etc.

• Simple, easy-to-use

• SQLite-like performance or better

• Lightweight

• No availability, concurrency or consistency problems

– The database can’t be the reason a song won’t play

– When adding a song to a playlist, it should be there and be there always

– The frontend asks only once which songs are in the DB and they should be there always

Features We Want (2)

• Can store whole song library

– Also read it all fast (or we cache it)

– No need for sorting

• Can store sort information somehow

– Sorting of playlists & playlist songs

– Not as obvious as it sounds

• Can fetch only certain user’s playlists

– Relational data!

• (Please, work with Ruby or JRuby)

NoSQL

CAP

• Consistency

• Availability

• Partition tolerance

Screw CAP

Why NoSQL?

• Data is simple, just lists of objects

– Not relational data

– Songs, Playlists, Users

– Don’t need big queries, joins, analytics, versioning, or anything super-

– Document-oriented systems seem nice for this

• Maybe JSON-oriented?

Why NoSQL? (2)

“try to limit the work done over your data and just store it, then retrieve it and

show it to the user, do not over process the information. Manipulate JSON on

the user interface and send it to the database with few or even none

modification.”

– djondb (http://djondb.com/documentation.html)

I like this idea.

Why NoSQL? (3)

• Standalone / embeddable / portable

– Most SQLite replacement suggestions were

NoSQL systems

• Schemaless

– Think different

Why NoSQL? (4)

• NoSQL systems usually concentrate on

performance and scalability

– I’m not really concerned about those things right

now

– Maybe should not pick NoSQL then?

Why NoSQL? (5)

• Try new things

• Experiment

• It’s what the cool kids use

• And in the end…

– Tech doesn’t matter. Until it does.

Choices

Criteria

STOP

Criteria

• Beatstream’s backend is small-scale with:

– Less than 100,000 rows

– Performance rarely a problem

– No horizontal scaling

And I’m stressing over database choice?

Criteria

Criteria

Criteria

• Standalone / embeddable / portable (no separate server software)

• Lightweight

• Keeps It Simple

• Can store and access our data easily

– Key-value, document-oriented, column-oriented, or something which fits

• Good performance

• No availability, concurrency or consistency problems

• (Works with Ruby, or with Java for use with JRuby)

Criteria (2)

In the future:

• Someone might create a Spotify competitor

using Beatstream with millions of users

• Scaling etc., becomes important, but it’s not important

now

Choices

Key-value

• Kyoto Cabinet

• LevelDB

Choices (2)

Document-oriented

• MongoDB

• CouchDB

• RavenDB

• Terrastore

Choices (3)

Other

• db4o

Findings

Kyoto Cabinet

Kyoto Cabinet

• Key-value store

• Standalone file-based database (also in-memory)

• Support for many languages (Ruby, Java, C#, PHP, etc.)

• Popular (community)

• Hash table or B+ tree based

– Can’t decide which one would be better for Beatstream, have to test

– Hash table: random sorted – not an issue, sorting in frontend

Source: http://fallabs.com/kyotocabinet/

Kyoto Cabinet (2)

Notes:

• Replace SQLite on apps that store simple data

• How do I store song and playlist data in a key-value store?

– Need two collections/tables: songs and playlists

– Own database files for songs and playlists?

– key: filepath --> value: song meta-data as JSON?

LevelDB

LevelDB

• Key-value store

• Standalone file-based database

• Support for many languages (C/C++, Ruby, Java)

• Built by Google for use in Google Chrome

• Sorting by key

• Fast write & read, slow if value is large

Sources: http://code.google.com/p/leveldb/, http://en.wikipedia.org/wiki/LevelDB

LevelDB (2)

Notes:

• Same as with Kyoto Cabinet: good for

simple use, but how do I use key-value

with Beatstream?

MongoDB

MongoDB

• Document-oriented

– JSON-style documents!

– Collections are logical and easy!

• For many languages (C/C++, Ruby, Java, C#, PHP, …)

• Easy to use, simple

– “Mongo is a schemaless relational database” – Some people

– Indexes instead of map/reduce functions

• Active community

– Lots of plugins, etc.

MongoDB (2)

Notes:

• Crash right after successful write: might lose data

• Embedding is not simple, need to build/find a

C++ wrapper & launch DB process in app.

source: http://stackoverflow.com/questions/6115637/can-mongodb-be-used-as-an-embedded-database

CouchDB

CouchDB

• Document-oriented

• For many languages

CouchDB (2)

Notes:

• I would have a HTTP API accessing a HTTP API?

• Embedding is hard

– Need to install Erlang somehow on the user’s

computer

RavenDB

RavenDB

• Document-oriented

• Standalone directory/file-based database

• For .NET and Javascript (NodeJS)

• Detailed info on how RavenDB works, listen:

http://herdingcode.com/wp-content/uploads/HerdingCode-0083-Ayende-Rahien-on-

RavenDB.mp3

RavenDB (2)

Notes:

• No Ruby support :(

Terrastore

Terrastore

• Document-oriented

• For Java, maybe can attach to JRuby?

• Main feature is scalability without sacrificing

consistency

• Seems easy to use

Terrastore (2)

Notes:

• Could not find how to run standalone…

db4o

db4o

• Object database

• For Java

• Simple, easy

• Sidenote: Works on Android out-of-the-box

db4o (2)

Notes:

• Known issue with objects duplicating by

itself sometimes

Eliminated Choices

Eliminated Choices • Berkeley DB

– No time to investigate…

• Cassandra

– “the right choice when you need scalability and high availability”

• SimpleDB

– “Optimized to provide high availability”

– Not really standalone / embeddable / portable, but in the cloud and “invisible”

• djondb

– Not standalone / embeddable / portable

• Couchbase

– Could not find a way to run embed, maybe it’s the same as with CouchDB

Conclusions

Conclusions

• NoSQL systems promote their horizontal

scalability, replication, sharding, etc.

– Features I don’t really care about right now

• Feels like I’m looking at the wrong thing in the

wrong place (for Beatstream at least)

– Only time will tell

Conclusions (2)

MongoDB

CouchDB

Kyoto Cabinet

Conclusions (3)

Simple key-values in SQLite:

• Kyoto Cabinet and LevelDB seem like

excellent replacements

– Use cases: Queue, word dictionary, user

database, document database, session

management, CMS cache

Conclusions (4)

More complex relations:

• Have a look at MongoDB, CouchDB or

RavenDB (.NET)

Extra

• Later: Convert Users-table in SQLite to the

new NoSQL database

– Songs can be re-created

– Playlists is a new feature, hasn’t been released

Extra (2)

• Redis could be embeddable:

“[communicate over unix domain socket] you can fork your main

process, then run one of the exec*() functions in the child to start

Redis.”

source: http://code.google.com/p/redis/issues/detail?id=276

Thx!

ajk.im

@darep

Links!

http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

http://blog.nahurst.com/visual-guide-to-nosql-systems

http://www.cs.tut.fi/~tjm/seminars/nosql2012/NoSQL-Intro.pdf

https://speakerdeck.com/u/kplawver/p/nosql-an-introduction

http://fallabs.com/kyotocabinet/rubydoc/

http://blog.creapptives.com/post/8330476086/leveldb-vs-kyoto-cabinet-my-findings

http://herdingcode.com/wp-content/uploads/HerdingCode-0083-Ayende-Rahien-on-

RavenDB.mp3