Let’s assume we have a collection with field name, which we need to sort using collation. Additionally, we have a second collection, which is going to be joined to the first collection on _id fields of both collections. _id fields are being set by ourselves (it is a string identifier for each document and consists of digits only).
There is an index on field name in first collection. Entire first collection has collation settings
{ collation: { locale: "pl", strength: 2 } }
Example of the pipeline:
[
{
$sort: {
name: 1
}
},
{
$lookup: {
from: "col2",
localField: "_id",
foreignField: "_id",
as: "joined"
}
},
{
$match: {
joined: {
$ne: []
}
}
},
{
$limit: 10
}
]
It seems, that even if collation is needed only to sort name in the first collection, it is used as well when comparing both _id fields between collections in lookup stage, therefore unless there is also an index with the same collation settings on joined collection (index on _id) the query is very slow.
Benchmarks: (aggregation is performed on col1 and col2 is being joined)
collation is not being used - 309ms
col1 with collation, col2 without collation - 4296ms
col1 with collation, col2 with collation - 266ms
So this is my question - it is not convenient to set collation for col2, as it is being created every day anew. Is there a way, to sort first collection using collation, but to not suffer from very slow lookup operation?
I have already tested setting collation only on index itself, not the entire collection and using collation options in aggregate, but the result is the same or even worse.