NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was...

NoSQL and MongoDB

Introduction to NoSQL

Based on a presentation by Traversy Media

What is NoSQL?Not only SQLSQL means

Relational modelStrong typingACID complianceNormalization…

NoSQL means more freedom or flexibility

Relevance to Big DataData gets biggerTraditional RDBMS cannot scale wellRDBMS is tied to its data and query processing modelsNoSQL relaxes some of the restrictions of RDBMS to provide a better performance

Advantages of NoSQLHandles Big DataData Models – No predefined schemaData Structure – NoSQL handles semi-structured dataCheaper to manageScaling – Scale out / horizonal scaling

Advantages of RDBMSBetter for relational dataData normalizationWell-established query language (SQL)Data IntegrityACID Compliance

Types of NoSQL DatabasesDocument Databases [MongoDB, CouchDB]Column Databases [Apache Cassandra]Key-Value Stores [Redis, Couchbase Server]Cache Systems [Redis, Memcached]Graph Databases [Neo4J]Streaming Systems [FlinkDB, Storm]

Structured/Semi-structured

ID Name Email …

1 Jack jack@example.com

2 Jill jill@example.net

3 Alex alex@example.org

Document 1

{ “id”: 1, “name”:”Jack”, “email”: “jack@example.com”, “address”: {“street”: “900 university ave”, “city”: “Riverside”, state: “CA”}, “friend_ids”: [3, 55, 123]}

Document 2

{ “id”: 2, “name”: “Jill”, “email”: “jill@example.net”, “hobbies”: [“hiking”, “cooking”]}

Columnar Data Store

ID Name Email …

Key-value Stores

1 à Jack jack@example.com …

2 à Jill jill@example.net …

3 à Alex alex@example.org …

ID Name Email …

Document DatabaseMongoDB

Document Data ModelRelational model (RDBMS)

DatabaseRelation (Table) : Schema

Record (Tuple) : Data

Document ModelDatabase

Collection : No predefined schemaDocument : Schema+data

No need to define/update schemaNo need to create collections

Document 1

{ “id”: 1, “name”:”Jack”, “email”: “jack@example.com”, “address”: {“street”: “900 university ave”, “city”: “Riverside”, state: “CA”}, “friend_ids”: [3, 55, 123]}

Document FormatMongoDB natively works with JSON documentsFor efficiency, documents are stored in a binary format called BSON (i.e., binary JSON)Like JSON, both schema and data are stored in each document

How to Use MongoDB

db.users.insert({name: “Jack”, email: “jack@example.com”});

Install: Check the MongoDB websitehttps://docs.mongodb.com/manual/installation/

db.users.find();db.users.find({name: “Jack”});

db.users.update({name: "Jack"}, {$set: {hobby: "cooking"}});updateOne, updateMany, replaceOne

db.users.remove({name: "Alex"});deleteOne, deleteMany

Create collection and insert a document

Retrieve all/some documents

Update

Delete

https://docs.mongodb.com/manual/crud/

Schema ValidationYou can still explicitly create collections and enforce schema validation

db.createCollection("students", {validator: { $jsonSchema: {bsonType: "object",required: [ "name", "year", "major", "address" ],properties: {name: {bsonType: "string",description: "must be a string and is required" },

https://docs.mongodb.com/manual/core/schema-validation/

Storage LayerPrior to MongoDB 3.2, only B-tree was available in the storage layerTo increase its scalability, MongoDB added LSM Tree in later versions after it acquired WiredTiger

mongod --wiredTigerIndexConfigString "type=lsm,block_compressor=zlib"

Override default configuration

LSM Vs B-tree

18https://github.com/wiredtiger/wiredtiger/wiki/Btree-vs-LSM

IndexingLike RDBMS, document databases use indexes to speed up some queries

MongoDB uses B-tree as an index structure

19https://docs.mongodb.com/manual/indexes/

Index TypesDefault unique _id indexSingle field index

db.collection.createIndex({name: -1});Compound index (multiple fields)

db.collection.createIndex( { name: 1, score: -1});Multikey indexes (for array fields)

Creates an index entry for each value

20https://docs.mongodb.com/manual/indexes/

Index TypesGeospatial index (for geospatial points)

Uses geohash to convert two dimensions to one dimension2d indexes: For Euclidean spaces2d sphere: spherical (earth) geometryWorks with multikey indexes for multiple locations (e.g., pickup and dropoff locations for taxis)

Text Indexes (for string fields)Automatically removes stop wordsStems the works to store the root only

Hashed Indexes (for point lookups)21

Geohashes

Additional Index FeaturesUnique indexes: Rejects duplicate keysSparse Indexes: Skips documents without the index field

In contrast, non-sparse indexes assume a null value if the index field does not exist

Partial indexes: Indexes only a subset of records based on a filter.

db.restaurants.createIndex({ cuisine: 1, name: 1 },{ partialFilterExpression: { rating: { $gt: 5 } } }

Comparison of data typesMin key (internal type)NullNumbers (32-bit integer, 64-bit integer, double)Symbol, StringObjectArrayBinary dataObject IDBooleanDate, timestampRegular expressionMax key (internal type)

24https://docs.mongodb.com/v3.6/reference/bson-type-comparison-order/

Comparison of data typesNumbers: All converted to a common typeStrings

Alphabetically (default)Collation (i.e., locale and language)

Arrays<: Smallest value of the array>: Largest value of the arrayEmpty arrays are treated as null

ObjectCompare fields in the order of appearanceCompare <name,value> for each field

Distributed ProcessingTwo methods for distributed processing

Replication (Similar to MySQL)Sharding (True horizontal scaling)

Replicationhttps://docs.mongodb.com/manual/replication/

Shardinghttps://docs.mongodb.com/manual/sharding/

Distributed Index StructureLog-structured Merge Tree (LSM)

Big Data IndexingHadoop and Spark are good in scanning large filesWe would like to speed up point and range queries on big data for some queriesHDFS limitation: Random updates are not allowedLog-structured Merge Tree (LSM-Tree) is adopted to address this problem.

RDBMS Indexing

New record

Index Update

New record

Randomly updated disk page(s)

Append a disk page

LSM TreeKey idea: Use the log as the indexRegularly: Merge the logs to consolidate the index (i.e., remove redundant entries)

New records LogLogLogLogLog

Flush Merge

Bigger log

O’Neil, Patrick, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. "The log-structured merge-tree (LSM-tree)." Acta Informatica 33, no. 4 (1996): 351-385.

LSM in Big DataFirst major application: BigTable (Google)

1997199819992000200120022003200420052006200720082009201020112012201320142015201620172018

Citations

First report from Google mentioning LSM

BigTable paper

LSM in Big DataBuffer data in memory (memory component)Flush records to disk into an LSM as a disk component (sequential write)Disk components are sorted by keyCompact (merge) disk components in the background (sequential read/write)

ConclusionMongoDB is a document database that is geared towards high update rates and transactional queriesIt adopts JSON as a data modelIt provides the flexibility to insert any kind of data without schema definitionLSM Tree is used for indexingWeak types are handled using a special comparison method for all types

NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was...

Documents

Transcript of NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was...

WiredTiger In-Memory - vs WiredTiger B-Tree...∙ Introducing Percona Memory Engine for MongoDB ∙ Percona Server for MongoDB manual

MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins

Realtime Analytics with MongoDB - MongoDB Meetup NYC

Automate MongoDB with MongoDB Management Service (MMS)

MongoDB World 2016: MongoDB + Google Cloud

MongoDB Days Silicon Valley: Introducing MongoDB 3.2

Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6

Webinar: Typische MongoDB Anwendungsfälle (Common MongoDB Use Cases)

Providing R-Tree Support for MongoDB · 2017. 12. 29. · Providing R-Tree Support for MongoDB . Longgang Xiang, Xiaotian Shao, Dehao Wang. a State Key Laboratory of Information Engineering

mongodb training | mongodb online training | mongodb training and certification | mongodb course

Providing R-Tree Support for MongoDB … · Providing R-Tree Support for MongoDB . Longgang Xiang, Xiaotian Shao, Dehao Wang. a State Key Laboratory of Information Engineering in

MongoDB Atlas - On Tour!: Introduction to MongoDB

MongoDB Profiler Deep Dive; MongoDB Austin 2013

with RocksDB LSM Tree MongoDB Storage Engine · MongoDB Storage Engine with RocksDB LSM Tree Denis Protivenskii, Software Engineer, Percona. 2 Contents ... -MongoDB contracts for

云数据库 MongoDB - UCloud · MongoDB⽬前⽀持MongoDB 2.4、MongoDB 2.6、MongoDB 3.0、MongoDB 3.2、MongoDB3.4、MongoDB 3.6和MongoDB 4.0，⽤⼾可以根据需求选择相应的云数据库版本。

Providing R-Tree Support for MongoDB · Providing R-Tree Support for MongoDB . Longgang Xiang, Xiaotian Shao, Dehao Wang. a State Key Laboratory of Information Engineering in …

MongoDB World 2016: MongoDB & IBM

MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines

MongoDB and Fractal Tree Indexes

The MongoDB Strikes Back / MongoDB 의 역습