Atlas Search sort with collation locale

Hello,

I have a collection containing documents with mixed English and Chinese text in the “name” field, like:

[
  {
    "key": 1, 
    "name": "test"
  },
  {
    "key": 2,  
    "name": "測試" 
  }
]

I want to sort the results so that Chinese text comes before English, using a traditional Chinese collation locale (“zh_Hant”)
Performing a regular $sort works as expected:

db.collection.aggregate([
  { $sort: { name: 1 } }
], { collation: { locale: 'zh_Hant' } } )

However, when I try to specify the sort within $search, the collation locale does not work:

db.collection.aggregate([
{
  $search: {
     index: "default",
     sort: { name: 1 } 
  }
}],
{ collation: { locale: 'zh_Hant' } })

Is there a way to make the collation locale apply when sorting inside the $search stage? Or do I need to perform the sort separately after $search? Thanks a lot

Hi @Alan_Lam1 and welcome to the community forum!!

Based on the above sample data, I tried to run the $search query on the collection,

Atlas atlas-bwjpmh-shard-0 [primary] test> db.col.aggregate([ { $search: { index: "default", sort: { name: 1 } } }], { collation: { locale: 'zh_Hant' } })

and got the error as:

MongoServerError: Query should contain either operator or collector
which specifies that the search query lacks the required operator.
Do you see the similar error?

Sorting in Atlas search has been introduced in version 6 and higher for sharded clusters. and as mentioned in the documentation to rebuilt the index for sorting, you would need to define the index with:

  • Index title field as:
    • token type for sorting
    • string type for querying

To help help you further, could you help me with the index definition that you have defined and the query results that you are seeing.

Best Regards
Aasawari

Dear @Aasawari

Thank you for your help in understanding the search error I encountered. To provide some more context:

I have an index defined with a ‘name’ field as the token type, which allows me to perform sorting on that field through the Atlas search functionality.

However, what I really need is to sort the ‘name’ field based on language-specific collation rules, similar to how aggregation allows sorting through a collation pipeline stage.

The data contains non-English names, so a simple alphabetical sort does not produce accurate results. I would like to leverage the collation rules to sort names correctly for the specified language (in my case, Chinese).

Is it possible to configure the Atlas search sorting to use collation, analogous to how aggregation currently supports this?

Regards,
Alan

Hi @Alan_Lam1

Yes it is possible to use collation.

Can you trying with the below query on your sample data ?

db.col.aggregate([{ "$search": { "text": { "path": "name", "query": "test" }, "sort": { "name": 1 } } }], { collation: { locale: 'zh_Hant' } } )

The above query is just an example, you can add/remove the necessary key values needed for the query.

Thanks
Aasawari

Hi @Aasawari

Thank you for the suggestion to use collation with the $search aggregation stage. I tried it on my sample data but still seeing some issues.

My data contains two Chinese words: “世紀” and “中國”. When sorting these in ascending order without $search, specifying the zh_Hant collation works as expected and returns the order as “中國” then “世紀”.

However, when using $search with the sort field inside, like in the example you provided, the zh_Hant collation is not applied and it returns the words in lexical instead of character order - “世紀” then “中國”.

I’m wondering if there is any limitation around collation not being fully supported inside the $search stage sort field? I’m using MongoDB Atlas v5.0.25

Regards,
Alan

At this time, collations are not supported by $search. If this is something you are interested in, I recommend opening a request in the MongoDB Feedback Portal.

I have a similar issue with french using accentuated characters and numericOrdering: true

This feels like an essential feature.