Navigation
This version of the documentation is archived and no longer supported.

BSON Documents

MongoDB is a document-based database system, and as a result, all records, or data, in MongoDB are documents. Documents are the default representation of most user accessible data structures in the database. Documents provide structure for data in the following MongoDB contexts:

Structure

The document structure in MongoDB are BSON objects with support for the full range of BSON types; however, BSON documents are conceptually, similar to JSON objects, and have the following structure:

{
   field1: value1,
   field2: value2,
   field3: value3,
   ...
   fieldN: valueN
}

Having support for the full range of BSON types, MongoDB documents may contain field and value pairs where the value can be another document, an array, an array of documents as well as the basic types such as Double, String, and Date. See also BSON Type Considerations.

Consider the following document that contains values of varying types:

var mydoc = {
               _id: ObjectId("5099803df3f4948bd2f98391"),
               name: { first: "Alan", last: "Turing" },
               birth: new Date('Jun 23, 1912'),
               death: new Date('Jun 07, 1954'),
               contribs: [ "Turing machine", "Turing test", "Turingery" ],
               views : NumberLong(1250000)
            }

The document contains the following fields:

  • _id that holds an ObjectId.
  • name that holds a subdocument that contains the fields first and last.
  • birth and death, which both have Date types.
  • contribs that holds an array of strings.
  • views that holds a value of NumberLong type.

All field names are strings in BSON documents. Be aware that there are some restrictions on field names for BSON documents: field names cannot contain null characters, dots (.), or dollar signs ($).

Note

BSON documents may have more than one field with the same name; however, most MongoDB Interfaces represent MongoDB with a structure (e.g. a hash table) that does not support duplicate field names. If you need to manipulate documents that have more than one field with the same name, see your driver’s documentation for more information.

Some documents created by internal MongoDB processes may have duplicate fields, but no MongoDB process will ever add duplicate keys to an existing user document.

Type Operators

To determine the type of fields, the mongo shell provides the following operators:

  • instanceof returns a boolean to test if a value has a specific type.
  • typeof returns the type of a field.

Example

Consider the following operations using instanceof and typeof:

  • The following operation tests whether the _id field is of type ObjectId:

    mydoc._id instanceof ObjectId
    

    The operation returns true.

  • The following operation returns the type of the _id field:

    typeof mydoc._id
    

    In this case typeof will return the more generic object type rather than ObjectId type.

Dot Notation

MongoDB uses the dot notation to access the elements of an array and to access the fields of a subdocument.

To access an element of an array by the zero-based index position, you concatenate the array name with the dot (.) and zero-based index position:

'<array>.<index>'

To access a field of a subdocument with dot-notation, you concatenate the subdocument name with the dot (.) and the field name:

'<subdocument>.<field>'

See also

  • Subdocuments for dot notation examples with subdocuments.
  • Arrays for dot notation examples with arrays.

Document Types in MongoDB

Record Documents

Most documents in MongoDB in collections store data from users’ applications.

These documents have the following attributes:

  • The maximum BSON document size is 16 megabytes.

    The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API. See mongofiles and the documentation for your driver for more information about GridFS.

  • Documents have the following restrictions on field names:

    • The field name _id is reserved for use as a primary key; its value must be unique in the collection, is immutable, and may be of any type other than an array.
    • The field names cannot start with the $ character.
    • The field names cannot contain the . character.

Note

Most MongoDB driver clients will include the _id field and generate an ObjectId before sending the insert operation to MongoDB; however, if the client sends a document without an _id field, the mongod will add the _id field and generate the ObjectId.

The following document specifies a record in a collection:

{
  _id: 1,
  name: { first: 'John', last: 'Backus' },
  birth: new Date('Dec 03, 1924'),
  death: new Date('Mar 17, 2007'),
  contribs: [ 'Fortran', 'ALGOL', 'Backus-Naur Form', 'FP' ],
  awards: [
            { award: 'National Medal of Science',
              year: 1975,
              by: 'National Science Foundation' },
            { award: 'Turing Award',
              year: 1977,
              by: 'ACM' }
          ]
}

The document contains the following fields:

  • _id, which must hold a unique value and is immutable.
  • name that holds another document. This sub-document contains the fields first and last, which both hold strings.
  • birth and death that both have date types.
  • contribs that holds an array of strings.
  • awards that holds an array of documents.

Consider the following behavior and constraints of the _id field in MongoDB documents:

  • In documents, the _id field is always indexed for regular collections.
  • The _id field may contain values of any BSON data type other than an array.

Consider the following options for the value of an _id field:

  • Use an ObjectId. See the ObjectId documentation.

    Although it is common to assign ObjectId values to _id fields, if your objects have a natural unique identifier, consider using that for the value of _id to save space and to avoid an additional index.

  • Generate a sequence number for the documents in your collection in your application and use this value for the _id value. See the Create an Auto-Incrementing Sequence Field tutorial for an implementation pattern.

  • Generate a UUID in your application code. For a more efficient storage of the UUID values in the collection and in the _id index, store the UUID as a value of the BSON BinData type.

    Index keys that are of the BinData type are more efficiently stored in the index if:

    • the binary subtype value is in the range of 0-7 or 128-135, and
    • the length of the byte array is: 0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 24, or 32.
  • Use your driver’s BSON UUID facility to generate UUIDs. Be aware that driver implementations may implement UUID serialization and deserialization logic differently, which may not be fully compatible with other drivers. See your driver documentation for information concerning UUID interoperability.

Query Specification Documents

Query documents specify the conditions that determine which records to select for read, update, and delete operations. You can use <field>:<value> expressions to specify the equality condition and query operator expressions to specify additional conditions.

When passed as an argument to methods such as the find() method, the remove() method, or the update() method, the query document selects documents for MongoDB to return, remove, or update, as in the following:

db.bios.find( { _id: 1 } )
db.bios.remove( { _id: { $gt: 3 } } )
db.bios.update( { _id: 1, name: { first: 'John', last: 'Backus' } },
                <update>,
                <options> )

See also

  • Query Document and Read for more examples on selecting documents for reads.
  • Update for more examples on selecting documents for updates.
  • Delete for more examples on selecting documents for deletes.

Update Specification Documents

Update documents specify the data modifications to perform during an update() operation to modify existing records in a collection. You can use update operators to specify the exact actions to perform on the document fields.

Consider the update document example:

{
  $set: { 'name.middle': 'Warner' },
  $push: { awards: { award: 'IBM Fellow',
                     year: '1963',
                     by: 'IBM' }
         }
}

When passed as an argument to the update() method, the update actions document:

  • Modifies the field name whose value is another document. Specifically, the $set operator updates the middle field in the name subdocument. The document uses dot notation to access a field in a subdocument.
  • Adds an element to the field awards whose value is an array. Specifically, the $push operator adds another document as element to the field awards.
db.bios.update(
   { _id: 1 },
   {
     $set: { 'name.middle': 'Warner' },
     $push: { awards: {
                        award: 'IBM Fellow',
                        year: '1963',
                        by: 'IBM'
                      }
            }
   }
)

See also

  • update operators page for the available update operators and syntax.
  • update for more examples on update documents.

For additional examples of updates that involve array elements, including where the elements are documents, see the $ positional operator.

Index Specification Documents

Index specification documents describe the fields to index on during the index creation. See indexes for an overview of indexes. [1]

Index documents contain field and value pairs, in the following form:

{ field: value }
  • field is the field in the documents to index.
  • value is either 1 for ascending or -1 for descending.

The following document specifies the multi-key index on the _id field and the last field contained in the subdocument name field. The document uses dot notation to access a field in a subdocument:

{ _id: 1, 'name.last': 1 }

When passed as an argument to the ensureIndex() method, the index documents specifies the index to create:

db.bios.ensureIndex( { _id: 1, 'name.last': 1 } )
[1]Indexes optimize a number of key read and write operations.

Sort Order Specification Documents

Sort order documents specify the order of documents that a query() returns. Pass sort order specification documents as an argument to the sort() method. See the sort() page for more information on sorting.

The sort order documents contain field and value pairs, in the following form:

{ field: value }
  • field is the field by which to sort documents.
  • value is either 1 for ascending or -1 for descending.

The following document specifies the sort order using the fields from a sub-document name first sort by the last field ascending, then by the first field also ascending:

{ 'name.last': 1, 'name.first': 1 }

When passed as an argument to the sort() method, the sort order document sorts the results of the find() method:

db.bios.find().sort( { 'name.last': 1, 'name.first': 1 } )

BSON Type Considerations

The following BSON types require special consideration:

ObjectId

ObjectIds are: small, likely unique, fast to generate, and ordered. These values consists of 12-bytes, where the first 4-bytes is a timestamp that reflects the ObjectId’s creation. Refer to the ObjectId documentation for more information.

String

BSON strings are UTF-8. In general, drivers for each programming language convert from the language’s string format to UTF-8 when serializing and deserializing BSON. This makes it possible to store most international characters in BSON strings with ease. [2] In addition, MongoDB $regex queries support UTF-8 in the regex string.

[2]Given strings using UTF-8 character sets, using sort() on strings will be reasonably correct; however, because internally sort() uses the C++ strcmp api, the sort order may handle some characters incorrectly.

Timestamps

BSON has a special timestamp type for internal MongoDB use and is not associated with the regular Date type. Timestamp values are a 64 bit value where:

  • the first 32 bits are a time_t value (seconds since the Unix epoch)
  • the second 32 bits are an incrementing ordinal for operations within a given second.

Within a single mongod instance, timestamp values are always unique.

In replication, the oplog has a ts field. The values in this field reflect the operation time, which uses a BSON timestamp value.

Note

The BSON Timestamp type is for internal MongoDB use. For most cases, in application development, you will want to use the BSON date type. See Date for more information.

If you create a BSON Timestamp using the empty constructor (e.g. new Timestamp()), MongoDB will only generate a timestamp if you use the constructor in the first field of the document. [3] Otherwise, MongoDB will generate an empty timestamp value (i.e. Timestamp(0, 0).)

Changed in version 2.1: mongo shell displays the Timestamp value with the wrapper:

Timestamp(<time_t>, <ordinal>)

Prior to version 2.1, the mongo shell display the Timestamp value as a document:

{ t : <time_t>, i : <ordinal> }
[3]

If the first field in the document is _id, then you can generate a timestamp in the second field of a document.

In the following example, MongoDB will generate a Timestamp value, even though the Timestamp() constructor is not in the first field in the document:

db.bios.insert( { _id: 9, last_updated: new Timestamp() } )

Date

BSON Date is a 64-bit integer that represents the number of milliseconds since the Unix epoch (Jan 1, 1970). The official BSON specification refers to the BSON Date type as the UTC datetime.

Changed in version 2.0: BSON Date type is signed. [4] Negative values represent dates before 1970.

Consider the following examples of BSON Date:

  • Construct a Date using the new Date() constructor in the mongo shell:

    var mydate1 = new Date()
    
  • Construct a Date using the ISODate() constructor in the mongo shell:

    var mydate2 = ISODate()
    
  • Return the Date value as string:

    mydate1.toString()
    
  • Return the month portion of the Date value; months are zero-indexed, so that January is month 0:

    mydate1.getMonth()
    
[4]Prior to version 2.0, Date values were incorrectly interpreted as unsigned integers, which affected sorts, range queries, and indexes on Date fields. Because indexes are not recreated when upgrading, please re-index if you created an index on Date values with an earlier version, and dates before 1970 are relevant to your application.