Managing the web nuggets with MongoDB and MongoKit
This is a guest post by Nicolas Clairon, maintainer of MongoKit and founder of
Elkorado
MongoKit
is a python ODM for MongoDB. I created it in 2009 (when the ODM acronym wasn’t even used) for my startup project called Elkorado. Now that the service is live, I realize that I never wrote about MongoKit. I’d like to introduce it to you with this quick tutorial based on real use cases from Elkorado.
Elkorado: a place to store web nuggets
Elkorado
is a collaborative, interest-based curation tool. It was born over the frustration that there is no place where to find quality resources about a particular topic of interest. There are so many blogs, forums, videos and websites out there that it is very difficult to find our way over this massive wealth of information.
Elkorado aims at helping people to centralize quality content, so they can find them later easily and discover new ones.
MongoDB to the rescue
Rapid prototyping is one of the most important thing in startup world and it is an area where MongoDB shines.
The web is changing fast, and so are web resources and their metadata. MongoDB’s and schemaless database is a perfect fit to store this kind of data. After losing hair by trying to use polymorphism with SQL databases, I went into MongoDB… and I felt in love with it.
While playing with the data, I needed a validation layer and wanted to add some methods to my documents. Back then, they was no ODM for Python. And so I created MongoKit.
MongoKit: MongoDB ODM for Python
MongoKit is a thin layer on top of Pymongo. It brings field validations, inheritance, polymorphism and a
bunch of other features
. Let’s see how it is used in Elkorado.
Elkorado is a collection of quality web resources called nuggets. This is how we could fetch a nugget discovered by the user “namlook” with Pymongo:
>>> import pymongo
>>> con = pymongo.Connection()
>>> nugget = con.elkorado.nuggets.find_one({"discoverer": "namlook"})
nuggets
here is a regular python dict.
Here’s a simple nugget definition with MongoKit:
import mongokit
connection = mongokit.Connection()
@connection.register
class Nugget(mongokit.Document):
database
= "elkorado"
collection
= "nuggets"
structure = {
"url": unicode,
"discoverer": unicode,
"topics": list,
"popularity": int
}
default_values = {"popularity": 0}
def is_popular(self):
""" this is for the example purpose """
return self.popularity > 1000
Fetching a nugget with MongoKit is pretty the same:
nugget = connection.Nugget.find_one({"discoverer": "namlook"})
However, this time, nugget is a
Nugget
object and we can call the
is_popular
method on it:
>>> nugget.is_popular()
True
One of the main advantages of MongoKit is that all your models are registered and accessible via the
connection
instance. MongoKit look at the
database
and
collection
fields to know which database and which collection has to be used. This is useful so we have only one place to specify those variables.
Inheritance
MongoKit was first build to natively support inheritance:
from datetime import datetime
class Core(mongokit.Document):
__database__ = "elkorado"
use_dot_notation = True
structure = {
"created_at": datetime,
"updated_at": datetime
}
default_values = {
"created_at": datetime.utcnow,
"updated_at": datetime.utcnow
}
def save(self, *args, **kwargs):
self.updated_at = datetime.utcnow()
super(Core, self).save(*args, **kwargs)
In this
Core
object, we are defining the database name and some fields that will be shared by other models.
If one wants a
Nugget
object to have date metadata, one just have to make it inherit from
Core
:
@connection.register
class Nugget(Core):
__collection__ = "nuggets"
stucture = {
"url": unicode,
"topics": list,
"discoverer": unicode,
"popularity": int
}
default_values = {"popularity": 0}
It’s all about Pymongo
With MongoKit, your are still very close to Pymongo. In fact, MongoKit’s connection, database and collection are subclasses of Pymongo’s. If once in an algorithm, you need pure performances, you can directly use Pymongo’s layer which is blazing fast:
>>> nuggets = connection.Nugget.find() # nuggets is a list of Nugget object
>>> nuggets = connection.elkorado.nuggets.collection.find() # nuggets is a list of python dict object.
Here,
connection
is a MongoKit connection but it can be used like a Pymongo connection. Note that to keep the benefice of DRY, we can call the pymongo’s layer from a MongoKit document:
>>> nuggets = connection.Nugget.collection.find() # fast!
A real life “simplified” example
Let’s see an example of CRUD done with MongoKit.
On Elkorado, each nugget is unique but multiple users can share a nugget which have differents metadata. Each time a user picks up a nugget, a
UserNugget
is created with specific informations. If this is the first time the nugget is discovered, a
Nugget
object is created, otherwise, it is updated. Here is a simplified
UserNugget
structure:
from mongokit import ObjectId, Connection
connection = Connection()
@connection.register
class UserNugget(Core):
collection
= "user_nuggets"
structure = {
"url": unicode,
"topics": [unicode],
"user_id": unicode
}
required_fields = ["url", "topics", "user_id"]
def save(self, *args, **kwargs):
super(self, UserNugget).save(*args, **kwargs)
nugget = self.db.Nugget.find_one({"url": self.url})
if not nugget:
nugget = self.db.Nugget(url=url, discoverer=self.user_id)
nugget.save()
self.db.Nugget.collection.update({"url": self.url}, {"$addToSet": {"topics": {"$each": self.topics}}, "$inc": 1})
This example well describes what can be done with MongoKit. Here, the
save
method has been overloaded to check if a nugget exists (remember, each nugget is unique by its URL). It will create it if it is not already created, and update it.
Updating data with MongoKit is similar to Pymongo. Use
save
on the object or use directly the Pymongo’s layer to make atomic updates. Here, we use atomic updates to push new topics and increase the popularity:
self.db.Nugget.collection.update({"url": self.url}, {
"$addToSet": {"topics": {"$each": self.topics}},
"$inc": 1
})
Getting live
Let’s play with our model:
>>> user_nugget = connection.UserNugget()
>>> user_nugget.url = u"http://www.example.org/blog/post123"
>>> user_nugget.user_id = u"namlook"
>>> user_nugget.topics = [u"example", u"fun"]
>>> user_nugget.save()
When calling the save method, the document is validated against the UserNugget’s structure. As expected, the fields
created_at
and
updated_at
have been added:
>>> user_nugget
{
"_id": ObjectId("4f314163a1e5fa16fe000000"),
"created_at": datetime.datetime(2013, 8, 4, 17, 22, 8, 3000),
"updated_at": datetime.datetime(2013, 8, 4, 17, 22, 8, 3000),
"url": u"http://www.example.org/blog/post123",
"user_id": u"namlook",
"topics": [u"example", u"fun"]
}
and the related nugget has been created:
>>> nugget = connection.Nugget.find_one({"url": "http://www.example.org/blog/post123"})
{
"_id": ObjectId("4f314163a1e5fa16fe000001"),
"created_at": datetime.datetime(2013, 8, 4, 17, 22, 8, 3000),
"updated_at": datetime.datetime(2013, 8, 4, 17, 22, 8, 3000),
"url": u"http://www.example.org/blog/post123",
"discoverer": u"namlook",
"topics": [u"example", u"fun"],
"popularity": 1
}
Conclusion
MongoKit is a central piece of
Elkorado
. It has been written to be small and minimalist but powerful. There is so much more to say about features like
inherited queries
,
i18n
and
gridFS
, so take a look at the wiki to read more about how this tool can help you.
Check the
documentation
for more information about MongoKit. And if you register on Elkorado, check out the
nuggets about MongoDB
. Don’t hesitate to share you nuggets as well, the more the merrier.
September 27, 2013