Rapid and Scalable Development with MongoDB, PyMongo, and Ming

30
© 2011Geeknet Inc R apid and Scalable Development with MongoDB, PyMongo, and Ming Rick Copeland @rick446 [email protected]

description

This talk, given at PyGotham 2011, will teach you techniques using the popular NoSQL database MongoDB and the Python library Ming to write maintainable, high-performance, and scalable applications. We will cover everything you need to become an effective Ming/MongoDB developer from basic PyMongo queries to high-level object-document mapping setups in Ming.

Transcript of Rapid and Scalable Development with MongoDB, PyMongo, and Ming

  • 1.R apid and Scalable Development with MongoDB, PyMongo, and Ming Rick Copeland @rick446 [email_address]

2.

  • SourceForge and MongoDB
  • Get started with PyMongo
  • Sprinkle in some Ming schemas
  • ORM: When a dict just wont do
  • What we are learning

3. SourceForges MongoDB

  • Tried CouchDB liked the dev model, not so much the performance
  • Migrated consumer-facing pages (summary, browse, download) to MongoDB and it worked great (on MongoDB 0.8 no less!)
  • All our new stuff uses MongoDB (Allura, Zarkov, Ming, )

4. What is MongoDB? MongoDB (from "humongous") is a scalable, high-performance,open source, document-oriented database. Sharding, Replication 20k inserts/s? No problem Hierarchical JSON-like store,easyto develop app Source Forge. Yeah. We like FOSS 5. MongoDB to Relational Mental Mapping

  • Rows are flat, documents are nested
  • Typing: SQL is static, MongoDB is dynamic

Relational (SQL) MongoDB Database Database Table Collection Index Index Row Document Column Field 6.

  • SourceForge and MongoDB
  • Get started with PyMongo
  • Sprinkle in some Ming schemas
  • ORM: When a dict just wont do
  • What we are learning

7. PyMongo: Getting Started

  • >>>import pymongo
  • >>>conn= pymongo.Connection( )
  • >>>conn
  • Connection('localhost', 27017)
  • >>>conn .test
  • Database(Connection('localhost', 27017), u'test')
  • >>>conn .test.foo
  • Collection(Database(Connection('localhost', 27017), u'test'), u'foo')
  • >>>conn[ 'test-db']
  • Database(Connection('localhost', 27017), u'test-db')
  • >>>conn[ 'test-db']['foo-collection']
  • Collection(Database(Connection('localhost', 27017), u'test-db'), u'foo-collection')
  • >>>conn .test.foo.bar.baz
  • Collection(Database(Connection('localhost', 27017), u'test'), u'foo.bar.baz')

8. PyMongo: Insert / Update / Delete

  • >>>db= conn.test
  • >>>id= db.foo.insert({ 'bar': 1,'baz':[1, 2, { k': 5} ] })
  • >>>id
  • ObjectId('4e712e21eb033009fa000000')
  • >>>db .foo.find()
  • >>>list(db .foo.find())
  • [{u'bar': 1, u'_id': ObjectId('4e712e21eb033009fa000000'), u'baz': [1, 2, {k': 5}]}]
  • >>>db .foo.update({ '_id': id}, { '$set': { 'bar': 2}})
  • >>>db .foo.find().next()
  • {u'bar': 2, u'_id': ObjectId('4e712e21eb033009fa000000'), u'baz': [1, 2, {k': 5}]}
  • >>>db .foo.remove({ '_id': id})
  • >>>list(db .foo.find())
  • [ ]

9. PyMongo: Queries, Indexes

  • >>>db .foo.insert([dict(x =x)forxinrange( 10) ])
  • [ObjectId('4e71313aeb033009fa00000b'), ]
  • >>>list(db .foo.find({'x': {'$gt':3} }))
  • [{u'x': 4, u'_id': ObjectId('4e71313aeb033009fa00000f')},
  • {u'x': 5, u'_id': ObjectId('4e71313aeb033009fa000010')},
  • {u'x': 6, u'_id': ObjectId('4e71313aeb033009fa000011')}, ]
  • >>>list(db .foo.find({'x': {'$gt':3} }, {'_id': 0 } ))
  • [{u'x': 4}, {u'x': 5}, {u'x': 6}, {u'x': 7}, {u'x': 8},
  • {u'x': 9}]
  • >>>list(db .foo.find({'x': {'$gt':3} }, {'_id': 0 } )
  • .skip( 1) .limit( 2))
  • [{u'x': 5}, {u'x': 6}]
  • >>>db .foo.ensure_index([
  • ( 'x', pymongo .ASCENDING), ( 'y', pymongo .DESCENDING) ] )
  • u'x_1_y_-1'

10. PyMongo: Aggregation et.al.

  • You gotta write Javascript(for now)
  • Its pretty slow (single-threaded JS engine)
  • Javascript is used by
    • $where in a query
    • .group(key, condition, initial, reduce, finalize=None)
    • .map_reduce(map, reduce, out, finalize=None, )
  • If you shard, you can get some parallelism across multiple mongod instances with .map_reduce() (and possibly $where). Otherwise youre single threaded.

11. PyMongo: GridFS >>>import gridfs >>>fs= gridfs.GridFS(db) >>>withfs .new_file()asfp: ...fp .write( 'The file') ...>>>fp >>>fp ._id ObjectId('4e727f64eb03300c0b000003') >>>fs .get(fp._id).read() 'The file'

  • Arbitrary data can be attached to the fp object its just a Document
    • Mime type
    • Filename

12. PyMongo: GridFS Versioning >>> file_id =fs .put( 'Moar data!', filename = 'foo.txt') >>>fs .get_last_version( 'foo.txt') .read() 'Moar data! >>> file_id =fs .put( 'Even moar data!', filename = 'foo.txt') >>>fs .get_last_version( 'foo.txt') .read() 'Even moar data! >>>fs .get_version( 'foo.txt',- 2) .read() 'Moar data! >>>fs .list() [u'foo.txt'] >>>fs .delete(fs.get_last_version( 'foo.txt') ._id) >>>fs .list() [u'foo.txt'] >>>fs .delete(fs.get_last_version( 'foo.txt') ._id) >>>fs .list() [] 13.

  • SourceForge and MongoDB
  • Get started with PyMongo
  • Sprinkle in some Ming schemas
  • ORM: When a dict just wont do
  • What we are learning

14. Why Ming?

  • Your data has a schema
    • Your database can define and enforce it
    • It can live in your application (as with MongoDB)
    • Nice to have the schema defined in one place in the code
  • Sometimes youneeda migration
    • Changing the structure/meaning of fields
    • Adding indexes, particularly unique indexes
    • Sometimes lazy, sometimes eager
  • Unit of work: Queuing up all your updates can be handy
  • Python dicts are nice; objects are nicer

15. Ming: Engines & Sessions >>>import ming.datastore >>>ds= ming.datastore.DataStore( 'mongodb://localhost:27017', database = 'test') >>>ds .db Database(Connection('localhost', 27017), u'test') >>>session= ming.Session(ds) >>>session .db Database(Connection('localhost', 27017), u'test') >>>ming .configure(**{ 'ming.main.master':'mongodb://localhost:27017', 'ming.main.database':'test'}) >>>Session .by_name( 'main') .db Database(Connection(u'localhost', 27017), u'test') 16. Ming: Define Your Schema

  • from ming importschema, Field
  • WikiDoc=collection( wiki_page' , session,
  • Field( '_id' , schema . ObjectId()),
  • Field( 'title' ,str , index = True ),
  • Field( 'text' ,str ))
  • CommentDoc=collection( comment' , session,
  • Field( '_id' , schema . ObjectId()),
  • Field( 'page_id' , schema . ObjectId(), index = True ),
  • Field( 'text' ,str ))

17. Ming: Define Your Schema Once more, withfeeling

  • from ming importDocument, Session, Field
  • class WikiDoc (Document):
  • class __mongometa__ :
  • session =Session.by_name( main')
  • name = 'wiki_page
  • indexes =[ ( 'title') ]
  • title= Field( str)
  • text= Field( str)
  • Old declarative syntax continues to exist and be supported, but its not being actively improved

18. Ming: Use Your Schema

  • >>>doc= WikiDoc( dict(title = 'Cats', text = 'I can haz cheezburger?'))
  • >>>doc .m.save()
  • >>>WikiDoc .m.find()
  • >>>WikiDoc .m.find().all()
  • [{'text': u'I can haz cheezburger?', '_id': ObjectId('4e727163eb03300c0b000001'), 'title': u'Cats'}]
  • >>>WikiDoc .m.find().one().text
  • u'I can haz cheezburger?
  • >>>doc= WikiDoc( dict(tietul = 'LOL', text = 'Invisible bicycle'))
  • >>>doc .m.save()
  • Traceback (most recent call last):File"", line1,
  • ming.schema.Invalid : '>: Extra keys: set(['tietul'])

19. Ming: Adding Your own Types

  • Not usually necessary, built-in SchemaItems provide BSON types, default values, etc.

class ForceInt (ming .schema.FancySchemaItem): def _validate( self, value): try :returnint(value) exceptTypeError: raiseInvalid( 'Bad value%s '% value, value,None) 20. Ming Bonus: Mongo-in-Memory >>>ming .datastore.DataStore( 'mim://', database = 'test') .db mim.Database(test)

  • MongoDB is (generally) fast
    • except when creating databases
    • particularly when you preallocate
  • Unit tests like things to be isolated
  • MIM gives you isolation at the expense of speed & scaling

21.

  • SourceForge and MongoDB
  • Get started with PyMongo
  • Sprinkle in some Ming schemas
  • ORM: When a dict just wont do
  • What we are learning

22. Ming ORM: Classes and Collectionsfrom ming importschema, Field from ming.orm import(mapper, Mapper, RelationProperty,ForeignIdProperty) WikiDoc=collection( wiki_page' , session, Field( '_id' , schema . ObjectId()), Field( 'title' ,str , index = True ), Field( 'text' ,str )) CommentDoc=collection( comment' , session, Field( '_id' , schema . ObjectId()), Field( 'page_id' , schema . ObjectId(), index = True ), Field( 'text' ,str )) class WikiPage ( object ):pass class Comment ( object ):pass ormsession . mapper(WikiPage, WikiDoc, properties = dict ( comments = RelationProperty( 'WikiComment' ))) ormsession . mapper(Comment, CommentDoc, properties = dict ( page_id = ForeignIdProperty( 'WikiPage' ), page = RelationProperty( 'WikiPage' ))) Mapper . compile_all() 23. Ming ORM: Classes and Collections (declarative)class WikiPage (MappedClass): class __mongometa__ : session= main_orm_session name= 'wiki_page indexes= ['title' ] _id =FieldProperty(S.ObjectId) title = FieldProperty( str) text= FieldProperty( str) class CommentDoc (MappedClass): class __mongometa__ : session= main_orm_session name= 'comment indexes= ['page_id' ] _id =FieldProperty(S.ObjectId) page_id = ForeignIdProperty(WikiPage) page = RelationProperty(WikiPage) text = FieldProperty( str) 24. Ming ORM: Sessions and Queries

  • SessionORMSession
  • My_collection.mMy_mapped_class.query
  • ORMSession actuallydoesstuff
    • Track object identity
    • Track object modifications
    • Unit of work flushing all changes at once

>>>pg= WikiPage(title= 'MyPage', text = 'is here') >>>session .db.wiki_page.count() 0 >>>main_orm_session .flush() >>>session .db.wiki_page.count() 1 25. Ming ORM: Extending the Session

  • Various plug points in the session
    • before_flush
    • after_flush
  • Some uses
    • Logging changes to sensitive data or for analytics purposes
    • Full-text search indexing
    • last modified fields

26.

  • SourceForge and MongoDB
  • Get started with PyMongo
  • Sprinkle in some Ming Schemas
  • ORM: When a dict just wont do
  • What we are learning

27. Tips From the Trenches

  • Watch your document size
  • Choose your indexes well
    • Watch your server log; bad queries show up there
  • Dont go crazy with denormalization
    • Try to use an index if all you need is a backref
    • Stale data is a tricky problem
  • Try to stay with one database
  • Watch the # of queries
  • Drop to lower levels (ORMdocumentpymongo) when performance is an issue

28. Future Work

  • Performance
  • Analytics in MongoDB: Zarkov
  • Web framework integration
  • Magic Columns (?)
  • ???

29. Related Projects Ming http://sf.net/projects/merciless/ MIT License Zarkov http://sf.net/p/zarkov/ Apache License Allura http://sf.net/p/allura/ Apache License PyMongo http://api.mongodb.org/python Apache License 30. Rick Copeland @rick446 [email_address]