Data analysis and visualization with mongo db [mongodb world 2016]
2011 Mongo FR - MongoDB introduction
-
Upload
antoinegirbal -
Category
Documents
-
view
1.746 -
download
0
description
Transcript of 2011 Mongo FR - MongoDB introduction
open-source, high-performance, document-oriented database
Antoine [email protected]
RDBMSRDBMS(Oracle, MySQL)(Oracle, MySQL)
New Gen. New Gen. OLAPOLAP
(vertica, aster, (vertica, aster, greenplum)greenplum)
Non-relationalNon-relationalOperational StoresOperational Stores
(“NoSQL”)(“NoSQL”)
RDBMS RDBMS HelperHelper
(MemCache, (MemCache, Application Application
layer)layer)
Data Store Analytics
non-relational, next-generation operational datastores and databases focus on scalability ease of modeling and changing data (also no sql syntax, thanks!)
NoSQL Really Means:
Horizontally ScalableArchitectures
no joins
no complex transactions+
JSON-style Documents
{ hello: “world” }
\x16\x00\x00\x00\x02hello\x00\x06\x00\x00\x00world\x00\x00
http://bsonspec.org
represented as BSON
Just like a light and friendly XML
Flexible “Schemas”In collection db.posts:
{author: “mike”, links: 3, date: "Sun Jul 18 2010 14:40:20 GMT-0700 (PDT)" text: “blah blah”}
{author: “eliot”, date: "Sun Jul 18 2010 14:40:22 GMT-0700 (PDT)" text: “Here is MongoDB ...”, views: 10}
Potentially all documents in the same collection
Embedded Document
{ _id: ObjectId("4d1009c7262bb4b94af1cea4") author_id: “1346”, date: "Sun Jul 18 2010 14:40:20 GMT-0700 (PDT)", title: “my story” text: “once upon a time ...”, tags: [“novel”,”english”], Comments:[ {user_id: 234, text: “awesome dude”}, {user_id: 1235, text: “that made me cry”}]}
little need for joinsor transactions across documents!
Data Model
Normalized:Many tablesRow/Column
Natural:CollectionsDocuments
JustBLOBS
PERFORMANCE
FEATURES
Can't scaleAs fast as BLOBs if
well modeled, horizontal scalability
Easy to scale (Dynamo), can
easily use caching
No featureMost regular SQL features (satisfies
90% of users)
(too) many
Features
•Complex querying
•Atomic updates with modifiers
•Indexing (unique, compound, Geo)
•Aggregation and Map / Reduce
•Capped Collections
•Powerful Shell (Javascript)
•GridFS: file storage
Replication
master
slave slave
Using Replica Set:- pool of servers with 1 master- automatic master election and failover- distributed reads (slaveOk)
slave
Client Client
Sharding
client
mongos ...mongos
mon
god
...
Shards
mongod
mongod
mongod
ConfigServers
mon
god
mon
god
mon
god
mon
god
mon
god
mon
god
mon
god
mon
god ...
For large datasets, or write heavy system
Support
•OS: Mac OS X, Windows, Linux, Solaris, 32/64 bits
•Drivers: C, C#, C++, Haskell, Java, Javascript, Perl, PHP, Python, Ruby, Scala... + community drivers
•Open-source project with active community, Wiki, Google Group, 10gen consulting / support
Production Examples Shutterfly Fourquare Craigslist bit.ly IGN Sourceforge Etsy the New York Times Business Insider Gilt Groupe
Intuit College humor Evite Disqus Justin.tv Heartbeat Hot Potato Eventbrite Sugar crm Electronic Arts ...
New Post
> post = {author: "mike",... date: new Date(),... text: "my blog post",... tags: ["mongodb", "intro"]}
> db.posts.save(post)
> db.posts.findOne(){ "_id" : ObjectId("4d2f944103e8fdbb36f6d205"),"author" : "mike","date" : ISODate("2011-01-14T00:08:49.933Z"),"text" : "my blog post","tags" : ["mongodb","intro"]}
A Quick Aside
•special key
•present in all documents
•unique across a Collection
•any type you want
_id
Update
> db.posts.update({_id: post._id},... { $set: {author: "tony"}})
> c = {author: "eliot", date: new Date(), text: "great post!"}> db.posts.update({_id: post._id},... { $push: {comments: c}})
> db.posts.update({_id: post._id},... { $inc: {views: 1}})
Querying> db.posts.findOne(){
"_id" : ObjectId("4d2f944103e8fdbb36f6d205"),"author" : "tony","comments" : [
{"author" : "eliot","date" : ISODate("2011-01-14T00:13:52.463Z"),"text" : "great post!"
}],"date" : ISODate("2011-01-14T00:08:49.933Z"),"tags" : [
"mongodb","intro"
],"text" : "my blog post","views" : 1
}
More Querying
Find by Author> db.posts.find({author: "tony"})
10 most recent posts:> db.posts.find().sort({date: -1}).limit(10)
Posts since April 1st:> april_1 = new Date(2010, 3, 1)> db.posts.find({date: {$gt: april_1}})
Adding an index to speed up:> db.posts.ensureIndex({author: 1})> db.posts.ensureIndex({date: 1})
More Querying
Find with regexp:> db.posts.find({text: /post$/})
Find within array:> db.posts.find({tags: "intro"})> db.posts.ensureIndex({tags: 1})
Find within embedded object:> db.posts.find({"comments.author": "eliot"})> db.posts.ensureIndex({"comments.author": 1})
More Querying
Counting:> db.posts.find().count()> db.posts.find({author: "tony"}).count()
Paging:> page = 2> page_size = 15> db.post.find().limit(page_size).skip(page * page_size)
Advanced operators:$gt, $lt, $gte, $lte, $ne, $all, $in, $nin, $where> db.posts.find({$where: "this.author == 'tony' || this.title == 'foo'"})
Download MongoDB
http://www.mongodb.org
and let us know what you think@mongodb
Current 1.6, soon 1.8