Navigation
This version of the documentation is archived and no longer supported.

Shard GridFS Data Store

When sharding a GridFS store, consider the following:

files Collection

Most deployments will not need to shard the files collection. The files collection is typically small, and only contains metadata. None of the required keys for GridFS lend themselves to an even distribution in a sharded situation. If you must shard the files collection, use the _id field possibly in combination with an application field.

Leaving files unsharded means that all the file metadata documents live on one shard. For production GridFS stores you must store the files collection on a replica set.

chunks Collection

To shard the chunks collection by { files_id : 1 , n : 1 }, issue commands similar to the following:

db.fs.chunks.ensureIndex( { files_id : 1 , n : 1 } )

db.runCommand( { shardCollection : "test.fs.chunks" , key : { files_id : 1 , n : 1 } } )

You may also want to shard using just the file_id field, as in the following operation:

db.runCommand( { shardCollection : "test.fs.chunks" , key : {  files_id : 1 } } )

Important

{ files_id : 1 , n : 1 } and {  files_id : 1 } are the only supported shard keys for the chunks collection of a GridFS store.

Note

Changed in version 2.2.

Before 2.2, you had to create an additional index on files_id to shard using only this field.

The default files_id value is an ObjectId, as a result the values of files_id are always ascending, and applications will insert all new GridFS data to a single chunk and shard. If your write load is too high for a single server to handle, consider a different shard key or use a different value for _id in the files collection.