No SQL and MongoDB - Hyderabad Scalability Meetup

56

Transcript of No SQL and MongoDB - Hyderabad Scalability Meetup

RDBMS: Past and

Present

Web Scale challenges today

Data explosion in past few years

Single web request may fire 10s/100s of queries!

Agile development

Hardware challenges - leverage low cost cloud infrastructure

Introduced in 1970s

Solved prevalent data storage issues

3

What is

CAP Theorem - It is impossible for a

distributed computer system to

simultaneously provide all three at the

same time

The Need

A

C PMongoDB, Redis,

Hbase, BigTable

Cassandra, SimpleDB,

DynamoRDBMS

5

SolutionsAvailability

Automatic

Replication

Auto

Sharding

Integrated

Caching

Dynamic

Schema

Consistency

Document

Database

Graph

Stores

Key-Value

Stores

Column

Stores

NoSQL

6

Database

Types

NoSQL

Document

Database

Graph Stores

Key-Value Stores

Column Stores

Document Database

What is it?• Documents are independent units

• Can store semi-structured Data with ease

Where is it useful?

• Ex. Product information in an ecommerce site.

Popular DBs

• MongoDB, CouchDB

8

Graph stores

What is it?• Based on graph theory

• Employ nodes, properties, and edges

Where is it useful?

• Ex. Social graphs

Popular DBs

• Neo4j, AllegroGraph, GraphDB

Key-value stores

What is it?• Stores key-value pairs.

• Several variations, such as in-memory DBs

Where is it useful?

• Ex. Quick access of data based on a key

Popular DBs

• Redis, Memcache

Column stores

What is it?• Stores data in same columns at

same place, rather than data from same rows

Where is it useful?

• Ex. Semi-structured data

• Useful for large data with aggregations

Popular DBs

• HBase, BigTable (Google)

Introduction to

A Document database

Instead of storing data in rows and columns as one would with a relational database, MongoDB stores a binary form of JSON documents (BSON)

Does not impose flat, rigid schemas across many tables like Relational Databases

Features of MongoDB

Document data model with dynamic schemas

Full, flexible index support and rich queries

Auto-Sharding for horizontal scalability

Built-in replication for high availability

Text search

Advanced security

Aggregation Framework and MapReduce

Large media storage with GridFS

How does a row look?

{FirstName:"Jonathan",Address:"15 Wanamassa Point

Road",Children:[

{Name:"Michael",Age:10}, {Name:"Jennifer", Age:8},{Name:"Samantha", Age:5},{Name:"Elena", Age:2}

]}

Terms and ConceptsSQL Terms/Concepts MongoDB Terms/Concepts

database database

table collection

row document or BSON document

column field

index index

table joinsembedded documents and

linking

primary key primary keySpecify any unique column or

column combination as primary key.

In MongoDB, the primary key is automatically set to the _id field.

aggregation (e.g. group by) aggregation framework

Common Operations - Create TableSQL Schema Statements MongoDB Schema Statements

CREATE TABLE users (id INT NOT NULL

AUTO_INCREMENT,user_id Varchar(30),age Number,status char(1),PRIMARY KEY (id)

)

Implicitly created on first insert operation. The primary key _id is automatically added if _id field is not specified.

db.users.insert( {user_id: "abc123",age: 55,status: "A"

} )

Explicitly create a collection:db.createCollection("users")

Common Operations – Alter

Table

SQL Alter Statements MongoDB Alter Statements

ALTER TABLE usersADD join_date DATETIME

ALTER TABLE usersDROP COLUMN join_date

Collections do not describe or enforce the structure of its documents. Alternatively:

db.users.update({ },{ $set: { join_date: new Date() } },{ multi: true }

)

db.users.update({ },{ $unset: { join_date: "" } },{ multi: true }

)

Common Operations - Insert

SQL Insert Statements MongoDB Insert Statements

INSERT INTO users(user_id,age,status)

VALUES ("bcd001",45,"A")

db.users.insert( {user_id: "bcd001",age: 45,status: "A"

} )

Common Operations - Select

SQL Select Statements MongoDB Select Statements

SELECT user_id, statusFROM usersWHERE status = "A“

db.users.find({ status: "A" },{ user_id: 1, status: 1, _id: 0 }

)

Common Operations - Update

SQL Update Statements MongoDB Update Statements

UPDATE usersSET status = "C"WHERE age > 25

db.users.update({ age: { $gt: 25 } },{ $set: { status: "C" } },{ multi: true }

)

Common Operations - Delete

SQL Delete Statements MongoDB Delete Statements

DELETE FROM usersWHERE status = "D“

DELETE FROM users

db.users.remove( { status: "D" } )

db.users.remove( )

Case Study:

Designing A Product

Catalog

Problem Overview

Product Catalog

Designing an E-Commerce product

catalog system using MongoDB as a

storage engine

Product catalogs must have the

capacity to store many differed types

of objects with different sets of

attributes.

A Quick Look at

Relational Approaches to

this problem

Relational Data Models - 1Concrete Table Inheritance: create a table for each product

category

CREATE TABLE `product_audio_album` (

`sku` char(8) NOT NULL,

`artist` varchar(255) DEFAULT NULL,

`genre_0` varchar(255) DEFAULT NULL,

...,

PRIMARY KEY(`sku`))

...

CREATE TABLE `product_film` (

...

Downside:

You must create a new table for every new category of products.

You must explicitly tailor all queries for the exact type of product.

Relational Data Models - 2Single Table Inheritance: Single table for all products, add new

columns to store data for a new product

CREATE TABLE `product` (

`sku` char(8) NOT NULL,

...

`artist` varchar(255) DEFAULT NULL,

`genre_1` varchar(255) DEFAULT NULL,

...

`title` varchar(255) DEFAULT NULL,

`rating` char(8) DEFAULT NULL,

...,

PRIMARY KEY(`sku`))

Downside: More flexible, but at expense of space

Relational Data Models - 3Multiple Table Inheritance

CREATE TABLE `product` (

`sku` char(8) NOT NULL,

`title` varchar(255) DEFAULT NULL,

`price`, ...

PRIMARY KEY(`sku`))

CREATE TABLE `product_audio_album` (

`sku` char(8) NOT NULL,

`genre_1` varchar(255) DEFAULT NULL,

...,

PRIMARY KEY(`sku`),

FOREIGN KEY(`sku`) REFERENCES `product`(`sku`))

...

CREATE TABLE `product_film` (

...

Downside: More flexible and saves space, but JOINs are very expensive

Relational Data Models - 4

Entity Attribute Values

Entity Attribute Value

sku_00e8da9b type Audio Album

sku_00e8da9b title A Love Supreme

sku_00e8da9b ... ...

sku_00e8da9b artist John Coltrane

sku_00e8da9b genre Jazz

sku_00e8da9b genre General

... ... ...

Downside: Totally flexible, but non-trivial queries need large number of JOINs

Non-relational Data Model

Use a single MongoDB collection to store

all the product data

Dynamic schema means that each

document need not conform to the same

schema

The document for each product only needs

to contain attributes relevant to that product.

So how does data look in

MongoDB with the non-relational

approach?

{

sku: "00e8da9b",

type: "Audio Album",

title: "A Love Supreme",

description: "by John Coltrane",

asin: "B0000A118M",

shipping: {

},

pricing: {

},

details: {

}

}

When to Choose MongoDB over RDBMS

2/17/2015

Best Practices for MongoDB

NoSQL products (and among them

MongoDB) should be used to meet

specific challenges.

2/17/2015

High Write Load

- MongoDB by default prefers high

insert rate over transaction safety.

- Preferably low business value for

each record

- Good examples are logs, streaming

data, bulk loads

2/17/2015

High Availability in an Unreliable

Environment - Setting replicaSet (set of servers that

act as Master-Slaves) is easy and fast.

- Instant recovery (automatic) from

failures of nodes (or data-center)

2/17/2015

Growth in data size with time

- Partitioning tables is complicated in

RDBMS

- IF your data is going to cross a few

GB for each table, you should

consider where you want to store it

- MongoDB provides simple sharding

mechanism to shard the data and

horizontally scale your application

2/17/2015

Location Based Service

- Use MongoDB if you store geo-

locations and wish to perform

proximity queries or related searches

- MongoDB geo queries are fast and

accurate

- Several use cases of geo-locations

in production apps

2/17/2015

Large data sets with Unstable

schema - Your data is reasonably large then its

complicated to change your schema

- When you work in Agile model your

product can change shape

dynamically

- MongoDB is schema-less

2/17/2015

No Dedicated DBA!

- Complicated operations such as

normalization, joins are avoided in

MongoDB

- Backup, storage mechanism

provided out of the box (MMS)

{ "Scaling" : true}

Scaling: Sharding

- Scale linearly as data grows

- Add more nodes

- Choose a shard key wisely

Scaling: Replica Sets

- Make your system highly available

- Read Only Replicas for reporting, help

reduce load

- Read Consistency across Replicas

HA Architecture

More Scaling?

- Capped Collections

- Use SSDs

- More RAM

- Faster cores rather than more cores

(mongod not optimized for multi-core)

- Consider Aggregation framework for

complex reports

- Text Search Support!

Real World Case Study

2/17/2015

Real-world case study

http://www.slideshare.net/oc666/mong

odb-user-group-billrun

- BillRun, a next generation Open

Source billing solution that utilizes

MongoDB as its data store.

- This billing system runs in production

in the fastest growing cellular operator

in Israel, where it processes over

500M CDRs (call data records) each

month.

2/17/2015

Schema-less design

- enables rapid introduction of new

CDR types to the system.

- It lets BillRun keep the data store

generic.

2/17/2015

Scale

- BillRun production site already

manages several TB in a single table.

- Not limited by adding new fields or

being limited by growth

2/17/2015

Rapid replicaSet

- enables meeting regulation with

easy to setup multi data center DRP

and HA solution.

2/17/2015

Sharding

- enables linear and scale out growth

without running out of budget.

2/17/2015

Geo API

- is being utilized to analyze users

usage and determining where to

invest in cellular infrastructure

2/17/2015

HuMongous

With over 2,000/s CDR inserts,

MongoDB architecture is great for a

system that must support high insert

load. Yet you can guarantee

transactions with findAndModify

(which is slower) and two-phase

commit (application wise).

References and further

readings! - MongoDB documentation:

http://docs.mongodb.org/manual/ - Tutorials and certificate programs:

https://education.10gen.com/ References: - http://java.dzone.com/articles/when-

use-mongodb-rather-mysql -

http://www.mysqlperformanceblog.com/2013/08/01/schema-design-in-mongodb-vs-schema-design-in-mysql/

{

Topic:"MongoDB By Example",

Presenter:"Ritesh Gupta",

Info:{

Mail:["[email protected]"]

Designation:"Sr Architect",

Company:"TechVedika"

Url:"www.techvedika.com"

}

}

Thank You!