I will answer this first. Index can only cover those queries that have the same collation. So when you create your index, you can specify the collation that your queries will use to the index. For example, if your query use en collation
Above index will cover any queries that use { collation: { locale: "en" } }.
Now you know how to create index with a specific collation. The reason an index cannot cover queries that use different collation is that the underlying index key data are different. To clearly demonstrate this point, you can create another index with different collation, for example, using default simple collation.
Now we can issue the same query/aggregate to see the different
db.collectionName.aggregate([.../** use dateString to group */], { collation: { locale: "en" } }).explain()
/** You will see something like this
* {
* ...
* indexBounds: {
* dateString: [
* '[CollationKey(0x1612161a050e1412050e1214010e010e), CollationKey(0x1612161a050e1412050e1814010e010e))'
* ]
* }
* ...
* }
db.collectionName.aggregate([.../** use dateString to group */], { collation: { locale: "simple" } }).explain()
/** You will see something like this
* {
* ...
* indexBounds: { dateString: [ '["2024-10-01", "2024-10-31")' ] }
* ...
* }
The simple collation is actually the raw binary data, so the index key don’t need to encode anything, that’s why you see the actual string date here. But en collation need to encode your string to the CollationKey so that it can sort accordingly. But since the data is encoded, if you issue query with simple collation on en collation index, it will compare "2024-10-01" with a bunch of CollationKey(0x...). You can see now it’s quite impossible for it to know the value is matched or not.
Then there are 3 cases here.
If you only have en collation index and issue simple collation query, the index is useless, so your query will use COLLSCAN to get the documents, then get dateString field form them, then pipe to $group stage.
If you have en collation and issue en collation query, the index can be used now, but the key is not your desired value, it’s encoded. So after IXSCAN to set matching keys, your query need to FETCH the actual documents of those keys, get dateString from documents then pipe to $group stage.
If you have simple collation and issue simple query, the index can be used, and it has raw data in it. So your query is covered, so PROJECTION_COVERED can get the field data directly from index key, then pipe to $group.
TL;DR: Index keys in collated index (not simple) are encoded, so your query always need to fetch the document to get field value.
Since we used the matching collation in the collection and the query (both German), the reason for the FETCH stage is that within the collated index, the key value is encoded and thereby useless for $group, even though the actual string data (e. g. “2024-10-01”) would not appear to require a change to be displayed in German (2024-10-01 is the same in German and in English; but technically, collations transform this differently). I understand that.