A Crash Course in MongoDB

40
a Crash Course in MongoDB PyCon US 2013

description

PyCon 2013 - A Crash Course in MongoDB

Transcript of A Crash Course in MongoDB

aCrash CourseinMongoDB

PyCon US 2013

Andy Dirnbergerhi. I’m

Engineering @ CBS Local@dirnonline

github.com/dirn

[email protected]

So what is

MongoDB

http://mongodb.org

?

MongoDB is...

‣ Document-oriented‣ JSON-like (BSON)‣ Dynamic schema*‣ Scalable‣ Open Source (GNU AGPL v3.0)**

*not the same thing as schemaless**drivers use the Apache license

MongoDB can be used for...

‣ Metrics‣ Logging*‣ Messaging Queues‣ Blog‣ Content Management‣ Anything you want

*Capped collections behave as fixed-sized FIFO queues*TTL collections have a special index that will automatically remove old data

To run MongoDB...

Download it:http://mongodb.org/downloads

or install it:$ sudo apt-get install mongodb$ brew install mongodb

Run it:$ mongod$ mongod --dbpath /var/lib/mongodb/$ mongod --fork

http://docs.mongodb.org/manual/tutorial/manage-mongodb-processes/

The driver...

Install it:$ pip install pymongo

http://api.mongodb.org/python/current/

Packages:pymongobsongridfs

BSON supports...

‣ int‣ float‣ basestring‣ list‣ dict‣ datetime.datetime

http://bsonspec.org/

Object IDs are made of...

‣ 4-byte timestamp (50d4dce7)‣ 3-byte machine identifier (0ea5fa)‣ 2-byte process ID (e6fb)‣ 3-byte counter (84e44b)

50d4dce70ea5fae6fb84e44b

Connect with MongoClient

>>> from pymongo import MongoClient>>>>>> MongoClient(host='localhost', port=27017)MongoClient('localhost', 27017)>>>>>> MongoClient(host='mongodb://localhost:27017')MongoClient('localhost', 27017)>>>>>> MongoClient('mongodb://localhost:27017').pyconDatabase(MongoClient('localhost', 27017), u'pycon')

Querying

Documents can be retrieved with...

>>> coll = db.talks>>> coll.find_one({ 'name': 'A Crash Course in MongoDB'}){ u'track': 2, u'_id': ObjectId('5145e5380ea5fa321fa97064'), u'speaker': u'Andy Dirnberger', u'name': u'A Crash Course in MongoDB', u'language': u'python', u'time': datetime.datetime(2013, 3, 17, 14, 30)}

Documents can be retrieved with...

>>> coll.find({ 'track': 2, 'time': {'$gte': datetime(2013, 3, 17), '$lt': datetime(2013, 3, 18)}}, {'name': 1})<pymongo.cursor.Cursor object at 0x10da4ed90>

http://docs.mongodb.org/manual/reference/operators/#query-selectors

What’s in the cursor?

>>> for doc in cursor:... print doc...{u'_id': ObjectId('5145e4f00ea5fa321fa97062'), u'name': u'Elasticsearch (Part 2)'}{u'_id': ObjectId('5145e5200ea5fa321fa97063'), u'name': u'Going beyond the Django ORM'}{u'_id': ObjectId('5145e5380ea5fa321fa97064'), u'name': u'A Crash Course in MongoDB'}

http://api.mongodb.org/python/current/api/pymongo/cursor.html

Updating

Documents can be removed with...

>>> coll.remove({'language': 'ruby'}){ u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0}

Documents can be removed with...

>>> coll.remove({ 'language': {'$in': ['php', 'node.js']}}){ u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0}

Documents can be removed with...

>>> coll.remove({'language': {'$ne': 'python'}}){ u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0}

Documents can be inserted with...

>>> db.tracks.insert({ 'number': 2, 'room': 'Grand Ballroom CD'})ObjectId('5145eb4e0ea5fa321fa97065')

Documents can be inserted with...>>> db.sessions.update( {'track': 2}, {'track': 2, 'date': datetime(2013, 3, 17), 'order': 1, 'chair': 'Megan Speir', 'runner': 'Erik Bray'}, upsert=True){ ... u'upserted': ObjectId('5145ecfd3f69a773554253e8'), u'n': 1, u'updatedExisting': False}

A couple of other methods...

save()

find_and_modify()

Works like update(..., upsert=True) if _id is specified, insert() if it’s not

Modifies the document in the database, returns the original by default, the updated with new=True

A note about update()

>>> db.sessions.update( {'_id': ObjectId('5145ecfd3f69a773554253e8')}, {'num_talks': 3}){...}>>>>>> # The document has been replaced>>> db.sessions.find_one({ '_id': ObjectId('5145ecfd3f69a773554253e8')}){ u'_id': ObjectId('5145ecfd3f69a773554253e8'), u'num_talks': 3}

Using update operators to target specific fields...

>>> db.sessions.update( {'_id': ObjectId('5145ecfd3f69a773554253e8')}, {'$set': {'num_talks': 3}}){ u'updatedExisting': True, u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 1}

http://docs.mongodb.org/manual/reference/operators/#update

Write concern...

w

wtimeout

The number of servers that must acknowledge the write, including the primary

The timeout for the write, without it the write could block forever

http://docs.mongodb.org/manual/core/write-operations/#write-concern

Write concern...

is turned on by default in MongoClient

Indexes

You can create an index with...

create_index()

ensure_index()

Unconditionally creates an index on one or more fields

Works like create_index() except the driver will “remember” that the index was already made

Indexes...

Are directional>>> db.sessions.ensure_index([ ('date', pymongo.ASCENDING), ('order', pymongo.DESCENDING)])u'date_1_order_-1'

Can be sparseOnly documents containing all fields in the index will be included in the index

Explain plans...

{ 'cursor' : '<Cursor Type and Index>', 'n' : <num (documents matching query)>, 'nscanned': <num (documents scanned)>, 'scanAndOrder': <boolean>,}

http://docs.mongodb.org/manual/reference/explain/

You want n and nscanned to be as close together as possibleIf scanAndOrder is True, the index can’t be used for sorting

GridFS

Storing files with GridFS...

‣ Files are stored in chunks‣ 4MB of RAM‣ Replication and Sharing

http://docs.mongodb.org/manual/applications/gridfs/

To use GridFS...

>>> import gridfs>>> fs = gridfs.GridFS(db)>>> file_id = fs.put('PyCon 2013', city='Santa Clara', state='CA')>>> file = fs.get(file_id)>>> file.read()'PyCon 2013'>>> file.upload_datedatetime.datetime(2013, 3, 17, 21, 30, 0, 0)>>> file.city, file.state(u'Santa Clara', u'CA')

GridFS is versioned...

get_last_version()

get_version()

Gets the most recent file matching the query

Works like get_last_version() except it can request specific versions of a file

Geospatial

Create an index...

>>> db.tracks.update( {'_id': ObjectId('5145eb4e0ea5fa321fa97065')}, {'loc': [37.3542, 121.9542]}){...}>>> db.tracks.ensure_index([ ('loc', pymongo.GEO2D)])u'loc_2d'

http://docs.mongodb.org/manual/applications/geospatial-indexes/

Query, query, query...

>>> db.tracks.find({'loc': [37.3542, 121.9542]})<pymongo.cursor.Cursor object at 0x10e14eb90>>>> db.tracks.find({ 'loc': {'$near': [37.3542, 121.9542]}})<pymongo.cursor.Cursor object at 0x10e14edd0>

You can query $within shapes...

‣ {'$center': [center, radius]}‣ {'$box': [[x1, y1], [x2, y2]]}‣ {'$polygon': [[x1, y1], [x2, y2], [x3, y3]]}

Anything else...

Aggregation Framework

Libraries

Helps with simple map reduce queries, but is subject to the same 16MB as documents

http://api.mongodb.org/python/current/tools.html

Thank you!dirn.it/PyCon2013

Questions?