Docs Menu

Docs HomeDevelop ApplicationsMongoDB DriversJava Sync

Collations

On this page

  • Overview
  • Collations in MongoDB
  • How to Specify Collations
  • Collection
  • Index
  • Operation
  • Index Types That Do Not Support Collations
  • Collation Options
  • Collation Examples
  • find() and sort() Example
  • findOneAndUpdate() Example
  • findOneAndDelete() Example
  • Aggregation Example

In this guide, you can learn how to use collations with MongoDB to order your query or aggregation operation results by string values. A collation is a set of character ordering and matching rules that apply to a specific language and locale.

You can learn more about collations in the following sections in this guide:

MongoDB sorts strings using binary collation by default. The binary collation uses the ASCII standard character values to compare and order strings. Certain languages and locales have specific character ordering conventions that differ from the ASCII character values.

For example, in Canadian French, the right-most accented character (diacritic) determines the ordering for strings when all preceding characters are the same. Consider the following Canadian French words:

  • cote

  • coté

  • côte

  • côté

When using binary collation, MongoDB sorts them in the following order:

cote
coté
côte
côté

When using the Canadian French collation, MongoDB sorts them in a different order as shown below:

cote
côte
coté
côté

MongoDB supports collations on most CRUD operations and aggregations. For a complete list of supported operations, see the Operations that Support Collations server manual page.

You can specify the locale code and optional variant in the following string format:

"<locale code>@collation=<variant code>"

The following example specifies the "de" locale code and "phonebook" variant code:

"de@collation=phonebook"

If you do not need to specify a variant, omit everything after the locale code as follows:

"de"

For a complete list of supported locales, see our server manual page on Supported Languages and Locales.

The following sections show you different ways to apply collations in MongoDB:

You can set a default collation when you create a collection. When you create a collection with a specified collation, all supported operations that scan that collection apply the rules of the collation.

You can only assign a default collation to a collection when you create that collection. However, you can specify a collation in a new index on an existing collection. See the Index section of this guide for more information.

The following snippet shows how to specify the "en_US" locale collation when creating a new collection called items:

database.createCollection(
"items",
new CreateCollectionOptions().collation(
Collation.builder().locale("en_US").build()));

To check whether you created the collation successfully, retrieve a list of the indexes on that collection as follows:

MongoCollection<Document> collection = database.getCollection("items");
List<Document> indexes = new ArrayList<>();
collection.listIndexes().into(indexes);
// Prints the collection's indexes and any default collations
indexes.forEach(idx -> System.out.println(idx.toJson()));

The output of your code should contain the following:

{ ...
"collation": { "locale": "en_US", ... }
...
}

You can specify a collation when you create a new index on a collection. The index stores an ordered representation of the documents in the collection so your operation does not need to perform the ordering in-memory. To use the index, your operation must meet the following criteria:

  • The operation uses the same collation as the one specified in the index.

  • The operation is covered by the index that contains the collation.

The following code snippet shows how you can create an index on the "name" field with the "en_US" locale collation in ascending order:

MongoCollection<Document> collection = database.getCollection("items");
IndexOptions idxOptions = new IndexOptions();
// Defines options that set a collation locale
idxOptions.collation(Collation.builder().locale("en_US").build());
// Creates an index on the "name" field with the collation and ascending sort order
collection.createIndex(Indexes.ascending("name"), idxOptions);

To check whether you created the collation successfully, retrieve a list of the indexes on that collection as follows:

MongoCollection<Document> collection = database.getCollection("items");
List<Document> indexes = new ArrayList<>();
collection.listIndexes().into(indexes);
// Prints the collection's indexes and any default collations
indexes.forEach(idx -> System.out.println(idx.toJson()));

The output of the preceding code should contain the following:

{ ...
"collation": { "locale": "en_US", ... }
...
}

The following code snippet shows an example operation that specifies the same collation and is covered by the index we created in the preceding code snippet:

FindIterable<Document> cursor = collection.find()
.collation(Collation.builder().locale("en_US").build())
.sort(Sorts.ascending("name"));

You can override the default collation on a collection by passing the new collation as a parameter to one of the supported operations. However, since the operation does not use an index, the operation may not perform as well as one that is covered by an index. For more information on the disadvantages of sorting operations not covered by an index, see the server manual page on Use Indexes to Sort Query Results.

The following code snippet shows an example query operation with the following characteristics:

  • The referenced collection contains the default collation "en_US" similar to the one specified in the Collection section.

  • The query specifies the Icelandic ("is") collation which is not covered by the collection's default collation index.

  • Since the specified collation is not covered by an index, the sort operation is performed in-memory.

FindIterable<Document> cursor = collection.find()
.collation(Collation.builder().locale("is").build())
.sort(Sorts.ascending("name"));

While most MongoDB index types support collation, the following types support only binary comparison:

This section covers various collation options and how to specify them to further refine the ordering and matching behavior.

Collation Option
Description
Locale
Required. The ICU locale code for language and variant.
locale() API Documentation
Backwards
Whether to consider diacritics from the end of the string first.
backwards() API Documentation
Case-sensitivity
Whether to consider case (upper or lower) as different values.
caseLevel() API Documentation
Alternate
Whether to consider spaces and punctuation.
collationAlternate() API Documentation
Case First
Whether to consider uppercase or lowercase first.
collationCaseFirst() API Documentation
Max Variable
Whether to ignore whitespace or both whitespace and punctuation. This setting is only valid when the alternate setting is "shifted".
collationMaxVariable() API Documentation
Strength
ICU level of comparison. The default value is "tertiary". For more information on each level, see the ICU Comparison Levels.
collationStrength() API Documentation
Normalization
Whether to perform unicode normalization on the text as needed. For more information on unicode normalization, see Unicode Normalization Forms.
normalization() API Documentation
Numeric Ordering
Whether to order numbers according to numeric value rather than collation order.
numericOrdering() API Documentation

You can use the Collation.Builder class to specify values for the preceding collation options. You can call the build() method to construct a Collation object as shown in the following code snippet:

Collation.builder()
.caseLevel(true)
.collationAlternate(CollationAlternate.SHIFTED)
.collationCaseFirst(CollationCaseFirst.UPPER)
.collationMaxVariable(CollationMaxVariable.SPACE)
.collationStrength(CollationStrength.SECONDARY)
.locale("en_US")
.normalization(false)
.numericOrdering(true)
.build();

For more information on the corresponding methods and parameters they take, see the API Documentation for Collation.Builder.

This section contains examples that demonstrate how to use a selection of MongoDB operations that support collations. For each example, assume that you start with the following collection of documents:

{ "_id" : 1, "first_name" : "Klara" }
{ "_id" : 2, "first_name" : "Gunter" }
{ "_id" : 3, "first_name" : "Günter" }
{ "_id" : 4, "first_name" : "Jürgen" }
{ "_id" : 5, "first_name" : "Hannah" }

In the following examples, we specify the "de@collation=phonebook" locale and variant collation. The "de" part of the collation specifies the German locale and the "collation=phonebook" part specifies a variant. The "de" locale collation contains rules for prioritizing proper nouns, identified by capitalization of the first letter. In the "collation=phonebook" variant, characters with umlauts are ordered before the same characters without them in an ascending sort.

The following example demonstrates how you can apply a collation when retrieving sorted results from a collection. To perform this operation, call find() on the example collection and chain the collation() and sort() methods to specify the order in which you want to receive the results.

Note

The following code example uses imports from the import com.mongodb.client.model package for convenience.

List<Document> results = new ArrayList<>();
// Retrieves all documents and applies a "de@collation-phonebook" collation and ascending sort to the results
collection.find()
.collation(Collation.builder().locale("de@collation=phonebook").build())
.sort(Sorts.ascending("first_name")).into(results);
// Prints the JSON representation of the results
if (results != null) {
results.forEach(doc -> System.out.println(doc.toJson()));
}

When we perform this operation on our example collection, the output should resemble the following:

{"_id": 3, "first_name": "Günter"}
{"_id": 2, "first_name": "Gunter"}
{"_id": 5, "first_name": "Hannah"}
{"_id": 4, "first_name": "Jürgen"}
{"_id": 1, "first_name": "Klara"}

For more information about the methods and classes mentioned in this section, see the following API Documentation:

This section demonstrates how you can specify a collation in an operation that updates the first match from your query. To specify the collation for this operation, instantiate a FindOneAndUpdateOptions object, set a collation on it, and pass it as a parameter to your call to the findOneAndUpdate() method.

In this example, we demonstrate the following:

  • Retrieve the first document in our example collection that precedes "Gunter" in an ascending order.

  • Set options for operation including the "de@collation=phonebook" collation.

  • Add a new field "verified" with the value "true".

  • Retrieve and print the updated document.

Note

The following code example uses imports from the import com.mongodb.client.model package for convenience.

Document result = collection.findOneAndUpdate(
Filters.gt("first_name", "Gunter"),
Updates.set("verified", true),
new FindOneAndUpdateOptions()
.collation(Collation.builder().locale("de@collation=phonebook").build())
.sort(Sorts.ascending("first_name"))
.returnDocument(ReturnDocument.AFTER));
// Prints the JSON representation of the updated document if an update occurred
if (result != null) {
System.out.println("Updated document: " + result.toJson());
}

Since "Günter" is lexically before "Gunter" using the de@collation=phonebook collation in ascending order, the preceding operation returns the following update document:

{
lastErrorObject: { updatedExisting: true, n: 1 },
value: { _id: 3, first_name: 'Günter' },
ok: 1
}

For more information about the methods and classes mentioned in this section, see the following API Documentation:

This section demonstrates how you can specify a numerical ordering of strings in a collation in an operation that deletes the first match from your query. To specify the collation for this operation, instantiate a FindOneAndDeleteOptions object, set a numeric ordering collation on it, and pass it as a parameter to your call to the findOneAndDelete() method.

This example calls the findOneAndDelete() operation on a collection that contains the following documents:

{ "_id" : 1, "a" : "16 apples" }
{ "_id" : 2, "a" : "84 oranges" }
{ "_id" : 3, "a" : "179 bananas" }

In the collation, we set the locale option to "en" and the numericOrdering option to "true" in order to sort strings based on their numerical order.

Note

The following code example uses imports from the import com.mongodb.client.model package for convenience.

Document result = collection.findOneAndDelete(
Filters.gt("a", "100"),
new FindOneAndDeleteOptions()
.collation(
Collation.builder()
.locale("en")
.numericOrdering(true)
.build())
.sort(Sorts.ascending("a")));
// Prints the JSON representation of the deleted document
if (result != null) {
System.out.println("Deleted document: " + result.toJson());
}

After you run the preceding operation, your output should resemble the following:

Deleted document: {"_id": 3, "a": "179 bananas"}

The numeric value of the string "179" is greater than the number 100, so the preceding document is the only match.

If we perform the same operation without the numerical ordering collation on the original collection of three documents, the filter matches all of our documents since "100" comes before "16", "84", and "179" when ordering by binary collation.

For more information about the methods and classes mentioned in this section, see the following API Documentation:

This section demonstrates how you can specify a collation in an aggregation operation. In an aggregation operation, you can specify a series of aggregation stages which is collectively called the aggregation pipeline. To perform an aggregation, call the aggregate() method on a MongoCollection object.

To specify a collation for an aggregation operation, call the collation() method on the AggregateIterable returned by the aggregation operation. Make sure to specify a sort aggregation stage on which to apply the collation in your aggregation pipeline.

The following example shows how we can construct an aggregation pipeline on the example collection and apply a collation by specifying the following:

  • A group aggregation stage using the Aggregates.group() helper to identify each document by the first_name field and use that value as the _id of the result.

  • An accumulator in the group aggregation stage to sum the number of instances of matching values in the first_name field.

  • Apply an ascending sort to the _id field of the output documents of the prior aggregation stage.

  • Construct a collation object, specifying the German locale and a collation strength that ignores accents and umlauts.

Bson groupStage = Aggregates.group("$first_name", Accumulators.sum("nameCount", 1));
Bson sortStage = Aggregates.sort(Sorts.ascending("_id"));
AggregateIterable<Document> results = collection
// Runs the aggregation pipeline that includes tallying "first_name" frequencies
.aggregate(Arrays.asList(groupStage, sortStage))
// Applies a collation to sort documents alphabetically by using the German locale, ignoring accents
.collation(Collation.builder().locale("de").collationStrength(CollationStrength.PRIMARY).build());
// Prints the JSON representation of the results
if (results != null) {
results.forEach(doc -> System.out.println(doc.toJson()));
}

The preceding code outputs the following documents:

{"_id": "Gunter", "nameCount": 2}
{"_id": "Hannah", "nameCount": 1}
{"_id": "Jürgen", "nameCount": 1}
{"_id": "Klara", "nameCount": 1}

For more information about the methods and classes mentioned in this section, see the following API Documentation:

←  IndexesLogging →