Using the Python toolkit Ming to accelerate your MongoDB development
This is a guest post from
Rick Copeland
of
Arborian
.
Ming
is a Python toolkit providing schema enforcement, an object/document mapper, an in-memory database, and various other goodies developed at
SourceForge
during our rewrite of the site from a PHP/Postgres stack to a Python/MongoDB one.
Why Ming?
If you’ve come to MongoDB from the world of relational databases, you have probably been struck by just how
easy
everything is: no big object/relational mapper needed, no new query language to learn (well, maybe a little, but we’ll gloss over that for now), everything is just Python dictionaries, and it’s so, so fast! While this is all true to some extent, one of the big things you give up with MongoDB is
structure
.
MongoDB is sometimes referred to as a
schema-free database
. (This is not technically true; I find it more useful to think of MongoDB as having
dynamically typed
documents. The
collection
doesn’t tell you anything about the type of documents it contains, but each individual
document
can be inspected.) While this can be nice, as it’s easy to iterate on your schema quickly in development, it’s also easy to get yourself in trouble the first time your application tries to query by a field that only exists in
some
of your documents.
The fact of the matter is that even if the database cares nothing about your schema, your application
does
, and if you play too fast and lose with document structure, it will come back to haunt you in the end. At SourceForge, we created
Ming
(as in “…the Merciless”, the villan who ruled the planet Mongo in Flash Gordon) to deal with precisely this problem. We wanted a (thin) layer on top of
PyMongo
that would do a couple of things for you:
Make sure that we don’t put malformed data into the database
Try to ‘fix’ malformed data coming back from the database
Ming’s Architecture
Ming’s architecture is based on the excellent SQL toolkit
SQLAlchemy
. While much younger than SQLAlchemy and not including any of its code, MongoDB takes its design inspiration from there.
Ming actually consists of a number of components, including:
The schema enforcement layer
- This is 'basic’ Ming, providing validation and conversion of documents on their way in and out of MongoDB. There are actually two APIs at this layer, the
imperative syntax
and a more
declarative syntax
.
The object/document mapper
- The
ODM Layer
extends the schema enforcement layer by providing a unit of work, identity map, and psuedo-relational concepts (one-to-many joins, for instance).
MongoDB-in-Memory
- This is layer designed to be a drop-in replacement for the native
pymongo
driver used for testing your application without needing to have access to a MongoDB server.
Let’s take a look at each of these components in turn…
Ming Schema Enforcement
A Ming schema is fairly straightforward. Below is an example containing the schema for a blog post in both the imperative and declarative syntaxes:
from
ming
import
collection
,
Field
,
Session
from
ming
import
schema
as
S
session
=
Session
()
# ming abstraction for database
# Set up the User schema ahead-of-time
User
=
dict
(
username
=
str
,
display_name
=
str
)
# "Imperative" style
BlogPost
=
collection
(
'blog.posts'
,
session
,
Field
(
'_id'
,
S
.
ObjectId
),
Field
(
'posted'
,
datetime
,
if_missing
=
datetime
.
utcnow
),
Field
(
'title'
,
str
),
Field
(
'author'
,
User
),
Field
(
'text'
,
str
),
Field
(
'comments'
,
[
dict
(
author
=
User
,
posted
=
S
.
DateTime
(
if_missing
=
datetime
.
utcnow
),
text
=
str
)
]))
# "Declarative" style
from
ming.declarative
import
Document
class
BlogPost
(
Document
):
class
mongometa
:
session
=
session
name
=
'blog.posts'
indexes
=
[
'author.name'
,
'comments.author.name'
]
_id
=
Field
(
str
)
title
=
Field
(
str
)
posted
=
Field
(
datetime
,
if_missing
=
datetime
.
utcnow
)
author
=
Field
(
User
)
text
=
Field
(
str
)
comments
=
Field
([
dict
(
author
=
User
,
posted
=
datetime
,
text
=
str
)
])
Once you have your schema set up, you can use it to perform all the same operations you can do in
pymongo
using the
manager
object attached to the attribute
m
:
# Bind the session to the database
from
ming.datastore
import
DataStore
session
.
bind
=
DataStore
(
'mongodb://localhost:27017'
,
database
=
'test'
)
# Queries
BlogPost
.
m
.
find
(
...
)
# equiv. to db.blog.posts.find(...)
# Inserts
post0
=
BlogPost
(
dict
(
...
fields
here
...
))
post0
.
m
.
insert
()
# Updates using save()
post1
=
BlogPost
.
m
.
find
({
'author.username'
:
'rick446'
})
.
first
()
post1
.
author
.
username
=
'rick447'
post1
.
m
.
save
()
# Updates using update_partial()
BlogPost
.
m
.
update_partial
(
{
'_id'
:
...
},
{
'$push'
:
{
'comments'
:
{
...
comment
data
...
}
}
})
# Deletes
post1
.
m
.
delete
()
# single document
BlogPost
.
m
.
remove
({
...
query
...
})
# delete by query
The Object-Document Mapper
Building on the schema enforcement layer is the object-document mapper, which provides two useful patterns:
Unit of Work
- This pattern collects the changes to your objects in memory until a point at which you
flush()
them all to the database at once.
Identity Map
- This guarantees that if you load the same database document twice, you’ll get the same object in memory. This keeps you from accidentally loading the object twice, modifying it twice, and having your two sets of changes overwrite one another.
Ming also allows you to model relationships between your documents via
ForeignIdProperty
and
RelationProperty
. Here is an example schema for a blog hosting site with multiple blogs:
from
ming
import
schema
as
S
from
ming.odm.declarative
import
MappedClass
from
ming.odm.property
import
FieldProperty
,
RelationProperty
from
ming.odm.property
import
ForeignIdProperty
from
ming.odm
import
ODMSession
# wrap the session from the schema layer
odm_session
=
ODMSession
(
session
)
class
Blog
(
MappedClass
):
class
mongometa
:
session
=
odm_session
name
=
'blog.blog'
<span class="n">_id</span> <span class="o">=</span> <span class="n">FieldProperty</span><span class="p">(</span><span class="n">S</span><span class="o">.</span><span class="n">ObjectId</span><span class="p">)</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">FieldProperty</span><span class="p">(</span><span class="nb">str</span><span class="p">)</span>
<span class="n">posts</span> <span class="o">=</span> <span class="n">RelationProperty</span><span class="p">(</span><span class="s">'Post'</span><span class="p">)</span>
class
Post
(
MappedClass
):
class
mongometa
:
session
=
odm_session
name
=
'blog.posts'
<span class="n">_id</span> <span class="o">=</span> <span class="n">FieldProperty</span><span class="p">(</span><span class="n">S</span><span class="o">.</span><span class="n">ObjectId</span><span class="p">)</span>
<span class="n">title</span> <span class="o">=</span> <span class="n">FieldProperty</span><span class="p">(</span><span class="nb">str</span><span class="p">)</span>
<span class="n">text</span> <span class="o">=</span> <span class="n">FieldProperty</span><span class="p">(</span><span class="nb">str</span><span class="p">)</span>
<span class="n">blog_id</span> <span class="o">=</span> <span class="n">ForeignIdProperty</span><span class="p">(</span><span class="n">Blog</span><span class="p">)</span>
<span class="n">blog</span> <span class="o">=</span> <span class="n">RelationProperty</span><span class="p">(</span><span class="n">Blog</span><span class="p">)</span>
Once you have the classes defined, you can load and modify the objects, using the
odm_session
to save your changes to MongoDB:
# Queries
Blog
.
query
.
find
(
...
)
# equiv. to db.blog.posts.find(...)
blog
=
Blog
.
query
.
get
(
name
=
'MongoDB Blog'
)
blog
.
posts
# returns a list of post objects for the blog
blog
.
posts
[
0
]
.
blog
# returns the blog object
# Inserts
post
=
Post
(
blog
=
blog
,
...
)
# automatically sets blog_id
# Updates
post
.
title
=
'The cool post'
# Save your changes
odm_session
.
flush
()
# Mark post for deletion
post
.
delete
()
# Actually delete
odm_session
.
flush
()
MongoDB-in-Memory
The third main component of Ming is an implementation of the
pymongo
API that allows you to perform testing of your application without having a dependency on a MongoDB server. To use MIM, you can swap out the creation of your
pymongo
connection:
from
ming
import
mim
import
unittest
class
TestCase
(
unittest
.
TestCase
):
<span class="k">def</span> <span class="nf">setUp</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c"># self.connection = Connection()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">connection</span> <span class="o">=</span> <span class="n">mim</span><span class="o">.</span><span class="n">Connection</span><span class="p">()</span>
MIM’s support of the
pymongo
api and MongoDB query syntax has largely been driven by the various APIs and queries used internal to SourceForge, so there are some gaps, but these are rapidly filled when reported. For instance, MIM does provide support for
gridfs
and
mapreduce
already (
mapreduce
Javascript support provided by
python-spidermonkey
). And of course MIM integrates well with the rest of Ming, allowing you to substitute a
mim://
URL for the normal
mongodb://
url in your datastore:
from
ming
import
mim
from
ming.datastore
import
DataStore
import
unittest
class
TestCase
(
unittest
.
TestCase
):
<span class="k">def</span> <span class="nf">setUp</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">ds</span> <span class="o">=</span> <span class="n">DataStore</span><span class="p">(</span>
<span class="s">'mongodb://localhost:27017'</span><span class="p">,</span> <span class="n">database</span><span class="o">=</span><span class="s">'test'</span><span class="p">)</span>
Conclusion
There are other good bits in MongoDB, including lazy and eager migrations, support for the MongoDB filesystem
gridfs
,
WSGI
auto-flushing middleware for the
ODMSession
, and more. We’re also
experimenting with support for GQL
, Google’s query language for the
Google App Engine
(GAE), to facilitate porting apps from GAE to MongoDB. Ming is actively maintained and is a mission-critical part of the SourceForge application stack, where it’s been in production use for over 2 years.
So what do you think? Is Ming something that you would use for your projects? Have you chosen one of the other MongoDB mappers? Please let us know in the comments below!
To learn more about development with Ming, check out Rick’s ebook
MongoDB with Python and Ming
or visit the
Atlanta MongoDB User Group
on Wednesday, where Rick is presenting.
July 24, 2012