Rapid and Scalable Development with MongoDB, PyMongo, and Ming
-
Upload
rick-copeland -
Category
Technology
-
view
4.330 -
download
2
description
Transcript of Rapid and Scalable Development with MongoDB, PyMongo, and Ming
- 1.R apid and Scalable Development with MongoDB, PyMongo, and Ming Rick Copeland @rick446 [email_address]
2.
- SourceForge and MongoDB
- Get started with PyMongo
- Sprinkle in some Ming schemas
- ORM: When a dict just wont do
- What we are learning
3. SourceForges MongoDB
- Tried CouchDB liked the dev model, not so much the performance
- Migrated consumer-facing pages (summary, browse, download) to MongoDB and it worked great (on MongoDB 0.8 no less!)
- All our new stuff uses MongoDB (Allura, Zarkov, Ming, )
4. What is MongoDB? MongoDB (from "humongous") is a scalable, high-performance,open source, document-oriented database. Sharding, Replication 20k inserts/s? No problem Hierarchical JSON-like store,easyto develop app Source Forge. Yeah. We like FOSS 5. MongoDB to Relational Mental Mapping
- Rows are flat, documents are nested
- Typing: SQL is static, MongoDB is dynamic
Relational (SQL) MongoDB Database Database Table Collection Index Index Row Document Column Field 6.
- SourceForge and MongoDB
- Get started with PyMongo
- Sprinkle in some Ming schemas
- ORM: When a dict just wont do
- What we are learning
7. PyMongo: Getting Started
- >>>import pymongo
- >>>conn= pymongo.Connection( )
- >>>conn
- Connection('localhost', 27017)
- >>>conn .test
- Database(Connection('localhost', 27017), u'test')
- >>>conn .test.foo
- Collection(Database(Connection('localhost', 27017), u'test'), u'foo')
- >>>conn[ 'test-db']
- Database(Connection('localhost', 27017), u'test-db')
- >>>conn[ 'test-db']['foo-collection']
- Collection(Database(Connection('localhost', 27017), u'test-db'), u'foo-collection')
- >>>conn .test.foo.bar.baz
- Collection(Database(Connection('localhost', 27017), u'test'), u'foo.bar.baz')
8. PyMongo: Insert / Update / Delete
- >>>db= conn.test
- >>>id= db.foo.insert({ 'bar': 1,'baz':[1, 2, { k': 5} ] })
- >>>id
- ObjectId('4e712e21eb033009fa000000')
- >>>db .foo.find()
- >>>list(db .foo.find())
- [{u'bar': 1, u'_id': ObjectId('4e712e21eb033009fa000000'), u'baz': [1, 2, {k': 5}]}]
- >>>db .foo.update({ '_id': id}, { '$set': { 'bar': 2}})
- >>>db .foo.find().next()
- {u'bar': 2, u'_id': ObjectId('4e712e21eb033009fa000000'), u'baz': [1, 2, {k': 5}]}
- >>>db .foo.remove({ '_id': id})
- >>>list(db .foo.find())
- [ ]
9. PyMongo: Queries, Indexes
- >>>db .foo.insert([dict(x =x)forxinrange( 10) ])
- [ObjectId('4e71313aeb033009fa00000b'), ]
- >>>list(db .foo.find({'x': {'$gt':3} }))
- [{u'x': 4, u'_id': ObjectId('4e71313aeb033009fa00000f')},
- {u'x': 5, u'_id': ObjectId('4e71313aeb033009fa000010')},
- {u'x': 6, u'_id': ObjectId('4e71313aeb033009fa000011')}, ]
- >>>list(db .foo.find({'x': {'$gt':3} }, {'_id': 0 } ))
- [{u'x': 4}, {u'x': 5}, {u'x': 6}, {u'x': 7}, {u'x': 8},
- {u'x': 9}]
- >>>list(db .foo.find({'x': {'$gt':3} }, {'_id': 0 } )
- .skip( 1) .limit( 2))
- [{u'x': 5}, {u'x': 6}]
- >>>db .foo.ensure_index([
- ( 'x', pymongo .ASCENDING), ( 'y', pymongo .DESCENDING) ] )
- u'x_1_y_-1'
10. PyMongo: Aggregation et.al.
- You gotta write Javascript(for now)
- Its pretty slow (single-threaded JS engine)
- Javascript is used by
-
- $where in a query
-
- .group(key, condition, initial, reduce, finalize=None)
-
- .map_reduce(map, reduce, out, finalize=None, )
- If you shard, you can get some parallelism across multiple mongod instances with .map_reduce() (and possibly $where). Otherwise youre single threaded.
11. PyMongo: GridFS >>>import gridfs >>>fs= gridfs.GridFS(db) >>>withfs .new_file()asfp: ...fp .write( 'The file') ...>>>fp >>>fp ._id ObjectId('4e727f64eb03300c0b000003') >>>fs .get(fp._id).read() 'The file'
- Arbitrary data can be attached to the fp object its just a Document
-
- Mime type
-
- Filename
12. PyMongo: GridFS Versioning >>> file_id =fs .put( 'Moar data!', filename = 'foo.txt') >>>fs .get_last_version( 'foo.txt') .read() 'Moar data! >>> file_id =fs .put( 'Even moar data!', filename = 'foo.txt') >>>fs .get_last_version( 'foo.txt') .read() 'Even moar data! >>>fs .get_version( 'foo.txt',- 2) .read() 'Moar data! >>>fs .list() [u'foo.txt'] >>>fs .delete(fs.get_last_version( 'foo.txt') ._id) >>>fs .list() [u'foo.txt'] >>>fs .delete(fs.get_last_version( 'foo.txt') ._id) >>>fs .list() [] 13.
- SourceForge and MongoDB
- Get started with PyMongo
- Sprinkle in some Ming schemas
- ORM: When a dict just wont do
- What we are learning
14. Why Ming?
- Your data has a schema
-
- Your database can define and enforce it
-
- It can live in your application (as with MongoDB)
-
- Nice to have the schema defined in one place in the code
- Sometimes youneeda migration
-
- Changing the structure/meaning of fields
-
- Adding indexes, particularly unique indexes
-
- Sometimes lazy, sometimes eager
- Unit of work: Queuing up all your updates can be handy
- Python dicts are nice; objects are nicer
15. Ming: Engines & Sessions >>>import ming.datastore >>>ds= ming.datastore.DataStore( 'mongodb://localhost:27017', database = 'test') >>>ds .db Database(Connection('localhost', 27017), u'test') >>>session= ming.Session(ds) >>>session .db Database(Connection('localhost', 27017), u'test') >>>ming .configure(**{ 'ming.main.master':'mongodb://localhost:27017', 'ming.main.database':'test'}) >>>Session .by_name( 'main') .db Database(Connection(u'localhost', 27017), u'test') 16. Ming: Define Your Schema
- from ming importschema, Field
- WikiDoc=collection( wiki_page' , session,
- Field( '_id' , schema . ObjectId()),
- Field( 'title' ,str , index = True ),
- Field( 'text' ,str ))
- CommentDoc=collection( comment' , session,
- Field( '_id' , schema . ObjectId()),
- Field( 'page_id' , schema . ObjectId(), index = True ),
- Field( 'text' ,str ))
17. Ming: Define Your Schema Once more, withfeeling
- from ming importDocument, Session, Field
- class WikiDoc (Document):
- class __mongometa__ :
- session =Session.by_name( main')
- name = 'wiki_page
- indexes =[ ( 'title') ]
- title= Field( str)
- text= Field( str)
- Old declarative syntax continues to exist and be supported, but its not being actively improved
18. Ming: Use Your Schema
- >>>doc= WikiDoc( dict(title = 'Cats', text = 'I can haz cheezburger?'))
- >>>doc .m.save()
- >>>WikiDoc .m.find()
- >>>WikiDoc .m.find().all()
- [{'text': u'I can haz cheezburger?', '_id': ObjectId('4e727163eb03300c0b000001'), 'title': u'Cats'}]
- >>>WikiDoc .m.find().one().text
- u'I can haz cheezburger?
- >>>doc= WikiDoc( dict(tietul = 'LOL', text = 'Invisible bicycle'))
- >>>doc .m.save()
- Traceback (most recent call last):File"", line1,
- ming.schema.Invalid : '>: Extra keys: set(['tietul'])
19. Ming: Adding Your own Types
- Not usually necessary, built-in SchemaItems provide BSON types, default values, etc.
class ForceInt (ming .schema.FancySchemaItem): def _validate( self, value): try :returnint(value) exceptTypeError: raiseInvalid( 'Bad value%s '% value, value,None) 20. Ming Bonus: Mongo-in-Memory >>>ming .datastore.DataStore( 'mim://', database = 'test') .db mim.Database(test)
- MongoDB is (generally) fast
-
- except when creating databases
-
- particularly when you preallocate
- Unit tests like things to be isolated
- MIM gives you isolation at the expense of speed & scaling
21.
- SourceForge and MongoDB
- Get started with PyMongo
- Sprinkle in some Ming schemas
- ORM: When a dict just wont do
- What we are learning
22. Ming ORM: Classes and Collectionsfrom ming importschema, Field from ming.orm import(mapper, Mapper, RelationProperty,ForeignIdProperty) WikiDoc=collection( wiki_page' , session, Field( '_id' , schema . ObjectId()), Field( 'title' ,str , index = True ), Field( 'text' ,str )) CommentDoc=collection( comment' , session, Field( '_id' , schema . ObjectId()), Field( 'page_id' , schema . ObjectId(), index = True ), Field( 'text' ,str )) class WikiPage ( object ):pass class Comment ( object ):pass ormsession . mapper(WikiPage, WikiDoc, properties = dict ( comments = RelationProperty( 'WikiComment' ))) ormsession . mapper(Comment, CommentDoc, properties = dict ( page_id = ForeignIdProperty( 'WikiPage' ), page = RelationProperty( 'WikiPage' ))) Mapper . compile_all() 23. Ming ORM: Classes and Collections (declarative)class WikiPage (MappedClass): class __mongometa__ : session= main_orm_session name= 'wiki_page indexes= ['title' ] _id =FieldProperty(S.ObjectId) title = FieldProperty( str) text= FieldProperty( str) class CommentDoc (MappedClass): class __mongometa__ : session= main_orm_session name= 'comment indexes= ['page_id' ] _id =FieldProperty(S.ObjectId) page_id = ForeignIdProperty(WikiPage) page = RelationProperty(WikiPage) text = FieldProperty( str) 24. Ming ORM: Sessions and Queries
- SessionORMSession
- My_collection.mMy_mapped_class.query
- ORMSession actuallydoesstuff
-
- Track object identity
-
- Track object modifications
-
- Unit of work flushing all changes at once
>>>pg= WikiPage(title= 'MyPage', text = 'is here') >>>session .db.wiki_page.count() 0 >>>main_orm_session .flush() >>>session .db.wiki_page.count() 1 25. Ming ORM: Extending the Session
- Various plug points in the session
-
- before_flush
-
- after_flush
- Some uses
-
- Logging changes to sensitive data or for analytics purposes
-
- Full-text search indexing
-
- last modified fields
26.
- SourceForge and MongoDB
- Get started with PyMongo
- Sprinkle in some Ming Schemas
- ORM: When a dict just wont do
- What we are learning
27. Tips From the Trenches
- Watch your document size
- Choose your indexes well
-
- Watch your server log; bad queries show up there
- Dont go crazy with denormalization
-
- Try to use an index if all you need is a backref
-
- Stale data is a tricky problem
- Try to stay with one database
- Watch the # of queries
- Drop to lower levels (ORMdocumentpymongo) when performance is an issue
28. Future Work
- Performance
- Analytics in MongoDB: Zarkov
- Web framework integration
- Magic Columns (?)
- ???
29. Related Projects Ming http://sf.net/projects/merciless/ MIT License Zarkov http://sf.net/p/zarkov/ Apache License Allura http://sf.net/p/allura/ Apache License PyMongo http://api.mongodb.org/python Apache License 30. Rick Copeland @rick446 [email_address]