/ /

Collations

Overview

In this guide, you can learn how to use collations with MongoDB to order your query or aggregation operation results by string values. A collation is a set of character ordering and matching rules that apply to a specific language and locale.

You can learn more about collations in the following sections of this guide:

Collations in MongoDB
How to Specify Collations
Collation Options
Collation Code Examples

Important

Project Reactor Library

This guide uses the Project Reactor library to consume Publisher instances returned by the Java Reactive Streams driver methods. To learn more about the Project Reactor library and how to use it, see Getting Started in the Reactor documentation. To learn more about how we use Project Reactor library methods in this guide, see the Write Data to MongoDB guide.

Collations in MongoDB

MongoDB sorts strings using binary collation by default. The binary collation uses the ASCII standard character values to compare and order strings. Certain languages and locales have specific character ordering conventions that differ from the ASCII character values.

For example, in Canadian French, the right-most accented character (diacritic) determines the ordering for strings when all preceding characters are the same. Consider the following Canadian French words:

cote
coté
côte
côté

When using binary collation, MongoDB sorts them in the following order:

cote
coté
côte
côté

When using the Canadian French collation, MongoDB sorts them in the following order:

cote
côte
coté
côté

How to Specify Collations

MongoDB supports collations on most CRUD operations and aggregations. For a complete list of supported operations, see Operations that Support Collations in the MongoDB Server manual.

You can specify the locale code and optional variant in the following string format:

"<locale code>@collation=<variant code>"

The following example specifies the "de" locale code and "phonebook" variant code:

"de@collation=phonebook"

If you do not specify a variant, use only the locale code.

For a complete list of supported locales, see Supported Languages and Locales in the MongoDB Server manual.

The following sections show different ways to apply collations in MongoDB:

Collection
Index
Operation

Collection

You can only set a default collation to a collection during creation. However, you can specify a collation in a new index on an existing collection. All supported operations that scan the collection then apply the default collation. See the Index section of this guide for more information.

The following example shows how to specify the "en_US" locale collation when creating a new collection called items:

Mono.from(database.createCollection(
                "items",
                new CreateCollectionOptions().collation(
                        Collation.builder().locale("en_US").build())))
        .block();

To check whether you created the collation successfully, retrieve a list of the indexes on that collection as follows:

List<Document> indexes = Flux.from(itemsCollection.listIndexes())
        .collectList().block();
if (indexes != null) {
    indexes.forEach(idx -> System.out.println(idx.toJson()));
}

The output of the preceding code should contain the following:

{ ...
  "collation": { "locale": "en_US", ... }
  ...
}

Index

You can specify a collation when you create a new index on a collection. The index stores documents in the specified order, eliminating the need for in-memory sorting during queries. To use the index, the operation must use the same collation as specified in the index and be covered by that index.

The following example shows how to create an index on the "name" field with the "en_US" locale collation in ascending order:

IndexOptions idxOptions = new IndexOptions();
idxOptions.collation(Collation.builder().locale("en_US").build());
Mono.from(itemsCollection.createIndex(
        Indexes.ascending("name"), idxOptions)).block();

To check whether you created the collation successfully, retrieve a list of the indexes on that collection as follows:

List<Document> indexes = Flux.from(itemsCollection.listIndexes())
        .collectList().block();
if (indexes != null) {
    indexes.forEach(idx -> System.out.println(idx.toJson()));
}

The output of the preceding code should contain the following:

{ ...
  "collation": { "locale": "en_US", ... }
  ...
}

The following example shows an operation that specifies the same collation and is covered by the index created in the preceding example:

FindPublisher<Document> indexPublisher = itemsCollection.find()
        .collation(Collation.builder().locale("en_US").build())
        .sort(Sorts.ascending("name"));
Flux.from(indexPublisher)
        .doOnNext(doc -> System.out.println(doc.toJson()))
        .blockLast();

Operation

You can override the default collation by passing a new collation to a supported operation. However, without an index, your query performs in-memory sorting, which is slower than using an indexed collation. For more information about the disadvantages of sorting operations not covered by an index, see Use Indexes to Sort Query Results in the MongoDB Server manual.

The following example shows a query operation with the following characteristics:

The referenced collection has a default "en_US" collation index, similar to the one specified in the Collection section.
The query specifies the Icelandic ("is") collation. Because this differs from the index collation, the query does not use the index and performs an in-memory sort instead.

FindPublisher<Document> customPublisher = itemsCollection.find()
        .collation(Collation.builder().locale("is").build())
        .sort(Sorts.ascending("name"));
Flux.from(customPublisher)
        .doOnNext(doc -> System.out.println(doc.toJson()))
        .blockLast();

Index Types That Do Not Support Collations

Most MongoDB index types support collation. However, the following types support only binary comparison and do not support collation:

text
2d

Collation Options

This section covers various collation options and how to specify them to further refine the ordering and matching behavior.

Collation Option	Description
Locale	Required. The ICU locale code for language and variant. locale()
Backwards	Specifies whether to consider diacritics from the end of the string first. backwards()
Case-sensitivity	Specifies whether to consider case (upper or lower) as different values. caseLevel()
Alternate	Specifies whether to consider spaces and punctuation. collationAlternate()
Case First	Specifies whether to consider uppercase or lowercase first. collationCaseFirst()
Max Variable	Specifies whether to ignore whitespace or both whitespace and punctuation. This setting is only valid when the alternate setting is "shifted". collationMaxVariable()
Strength	Specifies the ICU comparison level. The default value is "tertiary". For more information about each level, see the ICU Comparison Levels. collationStrength()
Normalization	Specifies whether to perform unicode normalization on the text as needed. For more information about unicode normalization, see Unicode Normalization Forms. normalization()
Numeric Ordering	Specifies whether to order numbers according to numeric value rather than collation order. numericOrdering()

You can use the Collation.Builder class to specify values for the preceding collation options. Call the build() method to construct a Collation object as shown in the following example:

Collation.builder()
        .caseLevel(true)
        .collationAlternate(CollationAlternate.SHIFTED)
        .collationCaseFirst(CollationCaseFirst.UPPER)
        .collationMaxVariable(CollationMaxVariable.SPACE)
        .collationStrength(CollationStrength.SECONDARY)
        .locale("en_US")
        .normalization(false)
        .numericOrdering(true)
        .build();

For more information about the corresponding methods and parameters, see the API documentation for Collation.Builder.

Collation Examples

This section contains examples of how to use MongoDB operations that support collations. For each example, assume that you start with the following collection of documents:

{ "_id" : 1, "first_name" : "Klara" }
{ "_id" : 2, "first_name" : "Gunter" }
{ "_id" : 3, "first_name" : "Günter" }
{ "_id" : 4, "first_name" : "Jürgen" }
{ "_id" : 5, "first_name" : "Hannah" }

The following examples use the "de@collation=phonebook" locale and variant collation. The "de" part of the collation specifies the German locale and the "collation=phonebook" part specifies a variant. The "de" locale collation contains rules for prioritizing proper nouns, identified by capitalization of the first letter. In the "collation=phonebook" variant, characters with umlauts are ordered before the same characters without them in an ascending sort.

find() and sort() Example

The following example shows how to apply a collation when retrieving sorted results from a collection. To perform this operation, call find() on the example collection and chain the collation() and sort() methods to specify the order in which you want to receive the results.

FindPublisher<Document> findPublisher = phonebookCollection.find()
        .collation(Collation.builder()
                .locale("de@collation=phonebook").build())
        .sort(Sorts.ascending("first_name"));
Flux.from(findPublisher)
        .doOnNext(doc -> System.out.println(doc.toJson()))
        .blockLast();

When you perform this operation on the example collection, the output resembles the following:

{"_id": 3, "first_name": "Günter"}
{"_id": 2, "first_name": "Gunter"}
{"_id": 5, "first_name": "Hannah"}
{"_id": 4, "first_name": "Jürgen"}
{"_id": 1, "first_name": "Klara"}

For more information about the methods and classes mentioned in this section, see the following API documentation:

findOneAndUpdate() Example

The following example specifies a collation in a findOneAndUpdate() operation by instantiating a FindOneAndUpdateOptions object and passing it as a parameter. The example performs the following operations:

Retrieves the first document in the example collection that precedes "Gunter" in ascending order.
Sets options for the operation, including the "de@collation=phonebook" collation.
Adds a new field "verified" with the value "true".
Retrieves and prints the updated document.

Document updatedDoc = Mono.from(
                phonebookCollection.findOneAndUpdate(
                        Filters.lt("first_name", "Gunter"),
                        Updates.set("verified", true),
                        new FindOneAndUpdateOptions()
                                .collation(Collation.builder()
                                        .locale("de@collation=phonebook")
                                        .build())
                                .sort(Sorts.ascending("first_name"))
                                .returnDocument(ReturnDocument.AFTER)))
        .block();
if (updatedDoc != null) {
    System.out.println("Updated document: " + updatedDoc.toJson());
}

Since "Günter" is lexically before "Gunter" using the de@collation=phonebook collation in ascending order, the preceding operation returns the following document:

Updated document: {"_id": 3, "first_name": "Günter", "verified": true}

For more information about the methods and classes mentioned in this section, see the following API documentation:

findOneAndDelete() Example

The following example specifies a numeric ordering collation in a findOneAndDelete() operation by instantiating a FindOneAndDeleteOptions object and passing it as a parameter. The collection contains the following documents:

{ "_id" : 1, "a" : "16 apples" }
{ "_id" : 2, "a" : "84 oranges" }
{ "_id" : 3, "a" : "179 bananas" }

The collation sets the locale option to "en" and the numericOrdering option to "true" to sort strings based on their numerical value.

Document deletedDoc = Mono.from(
                numericalCollection.findOneAndDelete(
                        Filters.gt("a", "100"),
                        new FindOneAndDeleteOptions()
                                .collation(Collation.builder()
                                        .locale("en")
                                        .numericOrdering(true)
                                        .build())
                                .sort(Sorts.ascending("a"))))
        .block();
if (deletedDoc != null) {
    System.out.println("Deleted document: " + deletedDoc.toJson());
}

After you run the preceding operation, your output resembles the following:

Deleted document: {"_id": 3, "a": "179 bananas"}

The numeric value of the string "179" is greater than 100, so the preceding document is the only match. Without numerical ordering, binary collation sorts "100" before "16", "84", and "179", so the filter matches all documents.

For more information about the methods and classes mentioned in this section, see the following API documentation:

Aggregation Example

The following example shows how to specify a collation in an aggregation operation. To perform an aggregation, call the aggregate() method on a MongoCollection object.

To specify a collation for an aggregation operation, call the collation() method on the AggregatePublisher returned by the aggregation operation. Specify a sort aggregation stage in your pipeline to apply the collation.

The following example constructs an aggregation pipeline on the example collection and applies a collation by specifying the following:

A group aggregation stage using Aggregates.group() to identify each document by the first_name field and use that value as the _id of the result.
An accumulator in the group stage to sum the number of instances of matching values in the first_name field.
An ascending sort on the _id field of the output documents.
A collation object specifying the German locale and a collation strength that ignores accents and umlauts.

Bson groupStage = Aggregates.group(
        "$first_name", Accumulators.sum("nameCount", 1));
Bson sortStage = Aggregates.sort(Sorts.ascending("_id"));
AggregatePublisher<Document> aggregatePublisher =
        phonebookCollection
                .aggregate(Arrays.asList(groupStage, sortStage))
                .collation(Collation.builder()
                        .locale("de")
                        .collationStrength(CollationStrength.PRIMARY)
                        .build());
Flux.from(aggregatePublisher)
        .doOnNext(doc -> System.out.println(doc.toJson()))
        .blockLast();

The preceding code outputs the following documents:

{"_id": "Gunter", "nameCount": 2}
{"_id": "Hannah", "nameCount": 1}
{"_id": "Jürgen", "nameCount": 1}
{"_id": "Klara", "nameCount": 1}

For more information about the methods and classes mentioned in this section, see the following API documentation:

Back

Transactions

Configure CRUD Operations