Docs Menu

Docs HomeDevelop ApplicationsMongoDB Manual

Encrypted Fields and Queries

On this page

  • Overview
  • Considerations when Enabling Querying
  • Contention
  • Adjusting the Contention Factor
  • Supported Query Types and Behavior
  • Client and Server Schemas

When you use Queryable Encryption, you define encrypted fields at the collection level using an encryption schema. Encrypting a field and enabling queries increases storage requirements and impacts query performance.

For instructions on creating an encryption schema and configuring querying, see Create an Encryption Schema

Warning

You can choose to make an encrypted field queryable. If you don't need to perform CRUD operations that require you to query an encrypted field, you may not need to enable querying on that field. You can still retrieve the entire document by querying other fields that are queryable or unencrypted.

When you make encrypted fields queryable, MongoDB creates an index for each encrypted field, which can make write operations on that field take longer. When a write operation updates an indexed field, MongoDB also updates the related index.

When you create an encrypted collection, MongoDB creates two metadata collections, increasing the storage space requirements.

Concurrent write operations, such as inserting the same field/value pair into multiple documents in close succession, can cause contention: conflicts that delay operations.

With Queryable Encryption, MongoDB tracks the occurrences of each field/value pair in an encrypted collection using an internal counter. The contention factor partitions this counter, similar to an array. This minimizes issues with incrementing the counter when using insert, update, or findAndModify to add or modify an encrypted field with the same field/value pair in close succession. contention = 0 creates an array with one element at index 0. contention = 4 creates an array with 5 elements at indexes 0-4. MongoDB increments a random array element during insert.

When unset, contention defaults to 8, which provides high performance for most workloads. Higher contention improves the performance of insert and update operations on low cardinality fields, but decreases find performance.

You can optionally include the contention property on queryable fields to change the contention factor from its default value of 8. Before you modify the contention factor, consider the following points:

Consider increasing contention above the default value of 8 only if the field has frequent concurrent write operations. Since high contention values sacrifice find performance in favor of insert and update operations, the benefit of a high contention factor for a rarely updated field is unlikely to outweigh the drawback.

Consider decreasing contention if a field is often queried, but rarely written. In this case, find performance is preferable to write and update performance.

You can calculate contention factor for a field by using a formula where:

  • ω is the number of concurrent write operations on the field in a short time, such as 30ms. If unknown, you can use the server's number of virtual cores.

  • valinserts is the number of unique field/value pairs inserted since last performing metadata compaction.

  • ω is ω/valinserts rounded up to the nearest integer. For a workload of 100 operations with 1000 recent values, 100/1000 = 0.1, which rounds up to 1.

A reasonable contention factor, cf, is the result of the following formula, rounded up to the nearest positive integer:

· (ω − 1)) / 0.2

For example, if there are 100 concurrent write operations on a field in 30ms, then ω = 100. If there are 50 recent unique values for that field, then ω = 100/50 = 2. This results in cf = (2·1)/0.2 = 10.

Warning

Don't set the contention factor on properties of the data itself, such as the frequency of field/value pairs (cardinality). Only set the contention factor based on your workload.

Consider a case where ω = 100 and valinserts = 1000, resulting in ω = 100/1000 = 0.1 ≈ 1 and cf = (1·0)/0.2 = 0 ≈ 1. 20 of the values appear very frequently, so you set contention = 3 instead. An attacker with access to multiple database snapshots can infer that the high setting indicates frequent field/value pairs. In this case, leaving contention unset so that it defaults to 8 would prevent the attacker from having that information.

For thorough information on contention and its cryptographic implications, see "Section 9: Guidelines" in MongoDB's Queryable Encryption Technical Paper

Querying non-encrypted fields or encrypted fields with a supported query type returns encrypted data that is then decrypted at the client.

Queryable Encryption currently supports none and equality query types. If the query type is unspecified, it defaults to none. If the query type is none, the field is encrypted, but clients can't query it.

The equality query type supports the following expressions:

Note

Queries that compare an encrypted field to null or to a regular expression result in an error, even with supported query operators.

Queryable Encryption equality queries don't support read or write operations on a field when the operation compares the encrypted field to any of the following BSON types:

  • double

  • decimal128

  • object

  • array

MongoDB supports using schema validation to enforce encryption of specific fields in a collection. Clients using automatic Queryable Encryption behave differently depending on the database connection configuration:

  • If the connection encryptedFieldsMap object contains a key for the specified collection, the client uses that object to perform automatic Queryable Encryption, rather than using the remote schema. At minimum, the local rules must encrypt all fields that the remote schema does.

  • If the connection encryptedFieldsMap object doesn't contain a key for the specified collection, the client downloads the server-side remote schema for the collection and uses it instead.

    Important

    Remote Schema Behavior

    When using a remote schema:

    • The client trusts that the server has a valid schema

    • The client uses the remote schema to perform automatic Queryable Encryption only. The client does not enforce any other validation rules specified in the schema.

← Fundamentals