A walk down NOSQL Lane in the cloud

Post on 17-May-2015

1.186 views 0 download

Tags:

description

Introduction to NOSQL and various NOSQL solutions.

Transcript of A walk down NOSQL Lane in the cloud

A Walk down NOSQL Lane in the Cloud

New York City Cloud Computing GroupFebruary 2011

Alexander Sicular@siculars

Who is this blowhard?Columbia University pays my mortgage

For the better part of a decade in Medical Informatics

Am not shilling for any of these companies

Am not a computer scientist

Am a computer science enthusiast particularly in the area of Informatics

When I put my data in the “cloud”, to me it just means that it’s

virtualized in someone else’s server room

Many, many providers and only growing

Amazon, Rackspace, Joyent, CouchOne, Cloudant, Azure, GAE, Heroku, no.de

Outsourced management

Zero capex

Controlled costs

...the Silver Lining

...With a Chance of Rain?

Vendor lock in

Unreliable performance

i/o

cpu, memory

Bare metal > software virtualization

NoSQL or NOSQL?Not Only SQL

Non/post relational

Big tent policy

Umbrella term

Fragmented

http://www.flickr.com/photos/morgennebel/2933723145/

Your Usage PatternsRead vs. Write

Mutable vs. Immutable

Product Considerations:

In place updates

Write Only Logs

This vs. ThatRiak wiki comparisons pagehttp://wiki.basho.com/Riak-Comparisons.html

Popular one page comparison of a number of NOSQL players by Kristof Kovacs:http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

Why NOSQLSupport for “Vary Large” data sets

Schemaless

Denormalized

Green field

New applications

http://www.flickr.com/photos/gailtang/1243984297/

AcademiaGoogle:

Bigtable http://labs.google.com/papers/bigtable.html

GFS http://labs.google.com/papers/gfs.html

M/R http://labs.google.com/papers/mapreduce.html

Amazon:

Dynamo http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf

NOSQL Summer http://nosqlsummer.org/papers

Under the Hood Terminology

Write Only Log http://en.wikipedia.org/wiki/Log-structured_file_system

Merkle Trees http://en.wikipedia.org/wiki/Hash_tree

B-trees http://en.wikipedia.org/wiki/B-tree

Vector clock http://en.wikipedia.org/wiki/Vector_clock

Bloom filters http://en.wikipedia.org/wiki/Bloom_filters

Big O Notation http://en.wikipedia.org/wiki/Big_o_notation

Consistent Hashing http://en.wikipedia.org/wiki/Consistent_hashing

CAP Theoremhttp://en.wikipedia.org/wiki/CAP_theorem

Consistency

Availability

Partition Tolerance

Pick two?

http://guide.couchdb.org/draft/consistency.html

CouchDBCouchOne, Cloudant

Erlang

Extreme replication scenarios

Works on phones

Updated indexing (b-tree)

HTTP interface

Offline usage

Sharded scaling

CouchDB Internal Architecture

http://nosqlpedia.com/wiki/File:CouchDB-Arch.JPG

MongoDB10Gen, MongoHQ, MongoLab

C++

huMONGOus

Sharded scaling, replicated master/slave

Located in NYC (go visit them)

Soft landing for those coming from mysql (relational databases)

Native javascript

Secondary indexes

MongoDB Sharding Diagram

http://www.snailinaturtleneck.com/blog/2010/03/30/sharding-with-the-fishes/

MySQL to Mongo Query similarity

http://nosqlpedia.com/wiki/File:MongoDB.JPG

RiakBasho, Joyent

Erlang

Distributed

HTTP, protobuf

Native javascript, erlang

Multiple backends

Homogeneous

CAP tunable

HadoopCloudera, Apache Foundation

Java

High latency

Batch oriented

HDFS is GFS based

Open source Google stack via the Google papers

Huge ecosystem

Yahoo, FB, Twitter, Fortune 500

Pig, Hive, Flume

HBaseJava

Low latency store

sits on top of Hadoop

Modeled after Google Bigtable

Column oriented

Thrift, protobuf

Backend for new Facebook Messaging service

CassandraApache

Java

Column oriented

Like Bigtable and Dynamo

Originated at Facebook

At Twitter, Distributed countinghttp://www.infoq.com/presentations/NoSQL-at-Twitter-by-Ryan-Kinghttp://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011

RedisOpenRedis

C

REmote DIctionary Server

Specific data structures

incredibly fast

memcached on steroids

replicated master/slave

CommonalitiesOpen Source

Adherence to common or standard:

data formats

json, bson, utf8, binary

data trandport mechanisms

http, thrift, protobuf, simple wire protocols

Ok. So Now What?Analyze your requirements

Mailing lists

IRC, twitter

Project pages, wiki

Github/Google Code/Bitbucket:

project page

specific language clients

Variety PackHybrid architectures will become the norm

Twitter - mysql, cassandra, hadoop

Google - mysql, GAE (BT)

Facebook - mysql, cassandra, hbase, memcached

Yahoo - mysql, hadoop

LinkedIn - voldemorthttp://www.flickr.com/photos/uncleweed/82245324/

Questions?

New York City Cloud Computing GroupFebruary 2011

Alexander Sicular@siculars