A Crash Course in MongoDB

download A Crash Course in MongoDB

of 40

  • date post

    28-Oct-2015
  • Category

    Documents

  • view

    102
  • download

    2

Embed Size (px)

description

PyCon 2013 - A Crash Course in MongoDB

Transcript of A Crash Course in MongoDB

  • aCrash CourseinMongoDB

    PyCon US 2013

  • Andy Dirnbergerhi. Im

    Engineering @ CBS Local@dirnonline

    github.com/dirn

    dirn@dirnonline.com

  • So what is

    MongoDB

    http://mongodb.org

    ?

  • MongoDB is... Document-oriented JSON-like (BSON) Dynamic schema* Scalable Open Source (GNU AGPL v3.0)**

    *not the same thing as schemaless**drivers use the Apache license

  • MongoDB can be used for... Metrics Logging* Messaging Queues Blog Content Management Anything you want

    *Capped collections behave as fixed-sized FIFO queues*TTL collections have a special index that will automatically remove old data

  • To run MongoDB...

    Download it:http://mongodb.org/downloads

    or install it:$ sudo apt-get install mongodb$ brew install mongodb

    Run it:$ mongod$ mongod --dbpath /var/lib/mongodb/$ mongod --fork

    http://docs.mongodb.org/manual/tutorial/manage-mongodb-processes/

  • PythonMongoDBusing

    with

    https://github.com/mongodb/mongo-python-driver

    PyMongo

  • The driver...

    Install it:$ pip install pymongo

    http://api.mongodb.org/python/current/

    Packages:pymongobsongridfs

  • BSON supports... int float basestring list dict datetime.datetime

    http://bsonspec.org/

  • Object IDs are made of...

    4-byte timestamp (50d4dce7) 3-byte machine identifier (0ea5fa) 2-byte process ID (e6fb) 3-byte counter (84e44b)

    50d4dce70ea5fae6fb84e44b

  • Connect with MongoClient

    >>> from pymongo import MongoClient>>>>>> MongoClient(host='localhost', port=27017)MongoClient('localhost', 27017)>>>>>> MongoClient(host='mongodb://localhost:27017')MongoClient('localhost', 27017)>>>>>> MongoClient('mongodb://localhost:27017').pyconDatabase(MongoClient('localhost', 27017), u'pycon')

  • Querying

  • Documents can be retrieved with...

    >>> coll = db.talks>>> coll.find_one({ 'name': 'A Crash Course in MongoDB'}){ u'track': 2, u'_id': ObjectId('5145e5380ea5fa321fa97064'), u'speaker': u'Andy Dirnberger', u'name': u'A Crash Course in MongoDB', u'language': u'python', u'time': datetime.datetime(2013, 3, 17, 14, 30)}

  • Documents can be retrieved with...

    >>> coll.find({ 'track': 2, 'time': {'$gte': datetime(2013, 3, 17), '$lt': datetime(2013, 3, 18)}}, {'name': 1})

    http://docs.mongodb.org/manual/reference/operators/#query-selectors

  • Whats in the cursor?

    >>> for doc in cursor:... print doc...{u'_id': ObjectId('5145e4f00ea5fa321fa97062'), u'name': u'Elasticsearch (Part 2)'}{u'_id': ObjectId('5145e5200ea5fa321fa97063'), u'name': u'Going beyond the Django ORM'}{u'_id': ObjectId('5145e5380ea5fa321fa97064'), u'name': u'A Crash Course in MongoDB'}

    http://api.mongodb.org/python/current/api/pymongo/cursor.html

  • Updating

  • Documents can be removed with...

    >>> coll.remove({'language': 'ruby'}){ u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0}

  • Documents can be removed with...

    >>> coll.remove({ 'language': {'$in': ['php', 'node.js']}}){ u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0}

  • Documents can be removed with...

    >>> coll.remove({'language': {'$ne': 'python'}}){ u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0}

  • Documents can be inserted with...

    >>> db.tracks.insert({ 'number': 2, 'room': 'Grand Ballroom CD'})ObjectId('5145eb4e0ea5fa321fa97065')

  • Documents can be inserted with...>>> db.sessions.update( {'track': 2}, {'track': 2, 'date': datetime(2013, 3, 17), 'order': 1, 'chair': 'Megan Speir', 'runner': 'Erik Bray'}, upsert=True){ ... u'upserted': ObjectId('5145ecfd3f69a773554253e8'), u'n': 1, u'updatedExisting': False}

  • A couple of other methods...

    save()

    find_and_modify()

    Works like update(..., upsert=True) if _id is specified, insert() if its not

    Modifies the document in the database, returns the original by default, the updated with new=True

  • A note about update()

    >>> db.sessions.update( {'_id': ObjectId('5145ecfd3f69a773554253e8')}, {'num_talks': 3}){...}>>>>>> # The document has been replaced>>> db.sessions.find_one({ '_id': ObjectId('5145ecfd3f69a773554253e8')}){ u'_id': ObjectId('5145ecfd3f69a773554253e8'), u'num_talks': 3}

  • Using update operators to target specific fields...

    >>> db.sessions.update( {'_id': ObjectId('5145ecfd3f69a773554253e8')}, {'$set': {'num_talks': 3}}){ u'updatedExisting': True, u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 1}

    http://docs.mongodb.org/manual/reference/operators/#update

  • Write concern...

    w

    wtimeout

    The number of servers that must acknowledge the write, including the primary

    The timeout for the write, without it the write could block forever

    http://docs.mongodb.org/manual/core/write-operations/#write-concern

  • Write concern...

    is turned on by default in MongoClient

  • Indexes

  • You can create an index with...

    create_index()

    ensure_index()

    Unconditionally creates an index on one or more fields

    Works like create_index() except the driver will remember that the index was already made

  • Indexes...

    Are directional>>> db.sessions.ensure_index([ ('date', pymongo.ASCENDING), ('order', pymongo.DESCENDING)])u'date_1_order_-1'

    Can be sparseOnly documents containing all fields in the index will be included in the index

  • Explain plans...

    { 'cursor' : '', 'n' : , 'nscanned': , 'scanAndOrder': ,}

    http://docs.mongodb.org/manual/reference/explain/

    You want n and nscanned to be as close together as possibleIf scanAndOrder is True, the index cant be used for sorting

  • GridFS

  • Storing files with GridFS...

    Files are stored in chunks 4MB of RAM Replication and Sharing

    http://docs.mongodb.org/manual/applications/gridfs/

  • To use GridFS...

    >>> import gridfs>>> fs = gridfs.GridFS(db)>>> file_id = fs.put('PyCon 2013', city='Santa Clara', state='CA')>>> file = fs.get(file_id)>>> file.read()'PyCon 2013'>>> file.upload_datedatetime.datetime(2013, 3, 17, 21, 30, 0, 0)>>> file.city, file.state(u'Santa Clara', u'CA')

  • GridFS is versioned...

    get_last_version()

    get_version()

    Gets the most recent file matching the query

    Works like get_last_version() except it can request specific versions of a file

  • Geospatial

  • Create an index...

    >>> db.tracks.update( {'_id': ObjectId('5145eb4e0ea5fa321fa97065')}, {'loc': [37.3542, 121.9542]}){...}>>> db.tracks.ensure_index([ ('loc', pymongo.GEO2D)])u'loc_2d'

    http://docs.mongodb.org/manual/applications/geospatial-indexes/

  • Query, query, query...

    >>> db.tracks.find({'loc': [37.3542, 121.9542]})

    >>> db.tracks.find({ 'loc': {'$near': [37.3542, 121.9542]}})

  • You can query $within shapes... {'$center': [center, radius]} {'$box': [[x1, y1], [x2, y2]]} {'$polygon': [[x1, y1], [x2, y2], [x3, y3]]}

  • Anything else...

    Aggregation Framework

    Libraries

    Helps with simple map reduce queries, but is subject to the same 16MB as documents

    http://api.mongodb.org/python/current/tools.html

  • Thank you!dirn.it/PyCon2013

    Questions?