Three Underused MongoDB Features
Rate this article
As a Developer Advocate for MongoDB, I have quite a few conversations with developers. Many of these developers have never used MongoDB, and so the conversation is often around what kind of data MongoDB is particularly good for. (Spoiler: Nearly all of them! MongoDB is a general purpose database that just happens to be centered around documents instead of tables.)
But there are lots of developers out there who already use MongoDB every day, and in those situations, my job is to make sure they know how to use MongoDB effectively. I make sure, first and foremost, that these developers know about MongoDB's
, which is, in my opinion, MongoDB's most powerful feature. It is relatively underused. If you're not using the Aggregation Framework in your projects, then either your project is very simple, or you could probably be doing things more efficiently by adding some aggregation pipelines.
One of the great things about MongoDB is that it's so easy to store data in it, without having to go through complex steps to map your data to the model expected by your database's schema expectations.
Because of this, it's quite common to use MongoDB as a cache as well as a database, to store things like session information, authentication data for third-party services, and other things that are relatively short-lived.
A common idiom is to store an expiry date in the document, and then when retreiving the document, to compare the expiry date to the current time and only use it if it's still valid. In some cases, as with OAuth access tokens, if the token has expired, a new one can be obtained from the OAuth provider and the document can be updated.
Another common idiom also involves storing an expiry date in the document, and then running code periodically that either deletes or refreshes expired documents, depending on what's correct for the use-case.
To use the definition from the documentation: "TTL indexes are special single-field indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time or at a specific clock time." TTL indexes are why I like to think of MongoDB as a platform for building data applications, not just a database. If you apply a TTL index to your documents' expiry field, MongoDB will automatically remove the document for you! This means that you don't need to write your own code for removing expired documents, and you don't need to remember to always filter documents based on whether their expiry is earlier than the current time. You also don't need to calculate the absolute expiry time if all you have is the number of seconds a document remains valid!
Let me show you how this works. The code below demonstrates how to create an index on the
expiresAfterSecondsis set to 3600 (which is one hour), any documents in the collection with
created_atset to a date will be deleted one hour after that point in time.
Another common idiom is to explicitly set the expiry time, when the document should be deleted. This is done by setting
Bear in mind that the background process that removes expired documents only runs every 60 seconds, and on a cluster under heavy load, maybe less frequently than that. So, if you're working with documents with very short-lived expiry durations, then this feature probably isn't for you. An alternative is to continue to filter by the expiry in your code, to benefit from finer-grained control over document validity, but allow the TTL expiry service to maintain the collection over time, removing documents that have very obviously expired.
If you're working with data that has a lifespan, then TTL indexes are a great feature for maintaining the documents in a collection.
Capped collections are an interesting feature of MongoDB, useful if you wish to efficiently store a ring buffer of documents.
A capped collection has a maximum size in bytes and optionally a maximum number of documents. (The lower of the two values is used at any time, so if you want to reach the maximum number of documents, make sure you set the byte size large enough to handle the number of documents you wish to store.) Documents are stored in insertion order, without the need for a specific index to maintain that order, and so can handle higher throughput than an indexed collection. When either the collection reaches the set byte
size, or the
maxnumber of documents, then the oldest documents in the collection are purged.
Capped collections can be useful for buffering recent operations (application-level operations - MongoDB's oplog is a different kind ofthing), and these can be queried when an error state occurs, in order to have a log of recent operations leading up to the error state.
Or, if you just wish to efficiently store a fixed number of documents in insertion order, then capped collections are the way to go.
Note that with the improved efficiency that comes with capped collections, there are also some limitations. It is not possible to explicitly delete a document from a capped collection, although documents will eventually be replaced by newly inserted documents. Updates in a capped collection also cannot change a document's size. You can't shard a capped collection. There are some other limitations around replacing and updating documents and transactions. Read the
for more details.
And finally, the biggest lesser-known feature of them all!
are a live stream of changes to your database. The
watchmethod, implemented in most MongoDB drivers, streams the changes made to
, or even your
, to your application in real-time. I'm always surprised by how few people have not heard of it, given that it's one of the first MongoDB features that really excited me. Perhaps it's just luck that I stumbled across it earlier.
In Python, if I wanted to print all of the changes to a collection as they're made, the code would look a bit like this:
In this case,
watchreturns an iterator which blocks until a change is made to the collection, at which point it will yield a BSON document describing the change that was made.
You can also filter the types of events that will be sent to the change stream, so if you're only interested in insertions or deletions, then those are the only events you'll receive.
I've used change streams (which is what the
watchmethod returns) to implement a chat app, where changes to a collection which represented a conversation were streamed to the browser using WebSockets.
But fundamentally, change streams allow you to implement the equivalent of a database trigger, but in your favourite programming language, using all the libraries you prefer, running on the servers you specify. It's a super-powerful feature and deserves to be better known.
Further documentation on the topics discussed here: