- Core MongoDB Operations (CRUD) >
- GridFS
GridFS¶
On this page
GridFS is a specification for storing and retrieving files that exceed the BSON-document size limit of 16MB.
Instead of storing a file in a single document, GridFS divides a file into parts, or chunks, [1] and stores each of those chunks as a separate document. By default GridFS limits chunk size to 256k. GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata.
When you query a GridFS store for a file, the driver or client will reassemble the chunks as needed. You can perform range queries on files stored through GridFS. You also can access information from arbitrary sections of files, which allows you to “skip” into the middle of a video or audio file.
GridFS is useful not only for storing files that exceed 16MB but also for storing any files for which you want access without having to load the entire file into memory. For more information on the indications of GridFS, see When should I use GridFS?.
[1] | The use of the term chunks in the context of GridFS is not related to the use of the term chunks in the context of sharding. |
Implement GridFS¶
To store and retrieve files using GridFS, use either of the following:
- A MongoDB driver. See the drivers documentation for information on using GridFS with your driver.
- The
mongofiles
command-line tool in themongo
shell. See mongofiles.
GridFS Collections¶
GridFS stores files in two collections:
chunks
stores the binary chunks. For details, see The chunks Collection.files
stores the file’s metadata. For details, see The files Collection.
GridFS places the collections in a common bucket by prefixing each
with the bucket name. By default, GridFS uses two collections with
names prefixed by fs
bucket:
fs.files
fs.chunks
You can choose a different bucket name than fs
, and create
multiple buckets in a single database.
The chunks
Collection¶
Each document in the chunks
collection represents a distinct chunk
of a file as represented in the GridFS store. The following is a
prototype document from the chunks
collection.:
A document from the chunks
collection contains the following fields:
-
chunks.
files_id
¶ The
_id
of the “parent” document, as specified in thefiles
collection.
-
chunks.
n
¶ The sequence number of the chunk. GridFS numbers all chunks, starting with 0.
The chunks
collection uses a compound index on
files_id
and n
, as described in GridFS Index.
The files
Collection¶
Each document in the files
collection represents a file in the
GridFS store. Consider the following prototype of a document in
the files
collection:
Documents in the files
collection contain some or all of the
following fields. Applications may create additional arbitrary fields:
-
files.
_id
¶ The unique ID for this document. The
_id
is of the data type you chose for the original document. The default type for MongoDB documents is BSON ObjectId.
-
files.
length
¶ The size of the document in bytes.
-
files.
chunkSize
¶ The size of each chunk. GridFS divides the document into chunks of the size specified here. The default size is 256 kilobytes.
-
files.
uploadDate
¶ The date the document was first stored by GridFS. This value has the
Date
type.
-
files.
md5
¶ An MD5 hash returned from the filemd5 API. This value has the
String
type.
-
files.
filename
¶ Optional. A human-readable name for the document.
-
files.
contentType
¶ Optional. A valid MIME type for the document.
-
files.
aliases
¶ Optional. An array of alias strings.
-
files.
metadata
¶ Optional. Any additional information you want to store.
GridFS Index¶
GridFS uses a unique, compound index on the chunks
collection for files_id
and n
. The index allows efficient retrieval of chunks using the
files_id
and n
values, as shown in the following example:
See the relevant driver documentation
for the specific behavior of your GridFS application. If your driver
does not create this index, issue the following operation using the
mongo
shell:
Example Interface¶
The following is an example of the GridFS interface in Java. The example is for demonstration purposes only. For API specifics, see the relevant driver documentation.
By default, the interface must support the default GridFS bucket, named
fs
, as in the following:
Optionally, interfaces may support other additional GridFS buckets as in the following example: