GIANT Stories at MongoDB

Upcoming Conferences for the MongoDB Team



We try to speak about MongoDB at as many conferences and meetups as possible. If you’re interested in learning more about MongoDB or in meeting some of the people who work on it then you should try to make it out to one. Our schedule for the next couple of months is below. If you know of (or are organizing) a conference/meetup where you’d like to hear from us shoot us an email at!

  • 10/5/2009 NYC NoSQL NYC Dwight will be presenting about MongoDB and Eliot will be on a panel discussion, but all of us will be at the event

  • 10/16/2009 DC DC Hadoop Meetup Mike will be talking about MongoDB

  • 10/23/2009 St Louis Strange Loop Conference Mike will be discussing MongoDB

  • 10/24/2009 Foz do Iguaçu, Brazil Latinoware Kristina will be talking about MongoDB for web applications

  • 10/27/2009 NYC NY PHP Kristina will be talking about using MongoDB from PHP

  • 11/7/2009 Poznań, Poland RuPy Mike will be talking about using MongoDB from Ruby and Python

  • 11/14/2009 Portland OpenSQLCamp Portland Mike will be in Portland for OpenSQLCamp

  • 11/17/2009 NYC Web 2.0 Expo Eliot will be talking about shifting to non-relational databases

  • 11/19/2009 San Francisco RubyConf Mike will be talking about using MongoDB from Ruby

  • 11/19/2009 NYC Interop New York Dwight will be talking about data in the cloud

Storing Large Objects and Files in MongoDB



Large objects, or “files”, are easily stored in MongoDB. It is no problem to store 100MB videos in the database. For example, MusicNation uses MongoDB to store its videos.

This has a number of advantages over files stored in a file system. Unlike a file system, the database will have no problem dealing with millions of objects. Additionally, we get the power of the database when dealing with this data: we can do advanced queries to find a file, using indexes; we can also do neat things like replication of the entire file set.

MongoDB stores objects in a binary format called BSON. BinData is a BSON data type for a binary byte array. However, MongoDB objects are typically limited to 4MB in size. To deal with this, files are “chunked” into multiple objects that are less than 4MB each. This has the added advantage of letting us efficiently retrieve a specific range of the given file.

While we could write our own chunking code, a standard format for this chunking is predefined, call GridFS. GridFS support is included in many MongoDB drivers and also in the mongofiles command line utility.

A good way to do a quick test of this facility is to try out the mongofiles utility. See the MongoDB documentation for more information on GridFS.

More Information

This post was updated in December 2014 to include additional resources and updated links.

MongoDB is Fantastic for Logging



We’re all quite used to having log files on lots of servers, in disparate places. Wouldn’t it be nice to have centralized logs for a production system? Logs that can be queried?

I would encourage everyone to consider using MongoDB for log centralization. It’s a very good fit for this problem for several reasons:

  1. MongoDB inserts can be done asynchronously. One wouldn’t want a user’s experience to grind to a halt if logging were slow, stalled or down. MongoDB provides the ability to fire off an insert into a log collection and not wait for a response code. (If one wants a response, one calls getLastError() – we would skip that here.)
  2. Old log data automatically LRU’s out. By using capped collections, we preallocate space for logs, and once it is full, the log wraps and reuses the space specified. No risk of filling up a disk with excessive log information, and no need to write log archival / deletion scripts.
  3. It’s fast enough for the problem. First, MongoDB is very fast in general, fast enough for problems like this. Second, when using a capped collection, insertion order is automatically preserved: we don’t need to create an index on timestamp. This makes things even faster, and is important given that the logging use case has a very high number of writes compared to reads (opposite of most database problems).
  4. Document-oriented / JSON is a great format for log information. Very flexible and “schemaless” in the sense we can throw in an extra field any time we want.

The MongoDB profiler works very much in the way outlined above, storing profile timings in a collection that is very log-like. We have been very happy with that implementation to date.

MongoDB Memory Usage



Mongo uses memory mapped files to access data, which results in large numbers being displayed in tools like top for the mongod process. This is normal when using memory-mapped files. Basically, the amount of mapped datafile is shown in the virtual size parameter, and resident bytes shows how much data is being cached in RAM. The larger your data files, the higher the vmsize of the mongod process. (This is also why Mongo is best ran on 64 bit operating systems.)

If other processes on the box need more ram, the operating system’s virtual memory manager will relinquish some memory from the cache – and the resident bytes on mongod process will drop.

You can get a feel for the “inherent” memory footprint of Mongo by starting it fresh, with no connections, with an empty /data/db directory and looking at the resident bytes. (Running with –nojni option will result in even lower core memory usage – and the move to spidermonkey, forthcoming, will make that case closer to the norm.)