There are plenty of existing messaging systems out there (Redis, AMQP, ØMQ, etc.) but I’ve recently found MongoDB to be a very compelling alternative, especially if you’re already running MongoDB somewhere in your setup. Using MongoDB’s capped collections and tailable cursors we can build a simple pub/sub system to communicate messages (documents) between processes.
When retrieving records from a tailable cursor we’re able to instruct the MongoDB server to block until some data becomes available (at which point it will be returned by the cursor). It’s worth noting here that the server will timeout after a few seconds of waiting for data and return nothing. In this case the driver you’re using will most likely initiate another blocking call behind the scenes- giving us the impression that the cursor is “listening” for data. This process may sound reminiscent of HTTP long polling in the way that data can be “pushed” to the listener. While we could achieve something similar by constantly re-querying for new data, using tailable cursors like this offers a much nicer solution.
I put together a very basic example to demonstrate this functionality using Node.js. You can grab it here if you want to follow along: https://gist.github.com/3210919 It assumes that you already have MongoDB installed and running locally.
First we need to create the capped collection in which messages will be stored. Unfortunately, it turns out that MongoDB won’t keep a tailable cursor open if the collection is empty, so let’s also create a blank document to “prime” the collection. We’ll fire up the Mongo shell to do this:
Without anyone listening for these message inserts, though, we haven’t accomplished anything terribly exciting.
When subscribing to newly inserted messages we first need to find the last document currently in the messages collection. We’ll then use the _id of that document to ensure that our tailable cursor only returns messages created in the future. Beware that since a capped collection does not have a unique index on _id by default, this initial query requires scanning the entire collection. Depending on the size of your capped collection it may be wise to create an index on _id.
I find the ability to perform complex queries like this an incredibly powerful feature and big selling point of using this setup.
With our tailable cursor created, we can then repeatedly “poll” the cursor for any new messages- keeping in mind that the callback passed to nextObject will not be called until data is available: