Error using explain() on with collation

Has anyone else encountered this issue before? I can run an explain on an aggregation pipeline or find query, but as soon as I add collation to the aggregation/query, I get an error for an “Invalid UTF-8 string in BSON document”. It occurs in both Compass and the shell.

The collation I’m using is locale: ‘en’ and numericOrdering: true.

Can you provide the full command you are sending to the server (and the version of the server also)? What you pasted looks partial and the explain() seems in the wrong place (it should be after the collection, before aggregate or find).

Asya

@Aysa_Kamsky thanks for the reply! I moved the explain() and still get the same results.

Server version: 4.4.0 (local single server instance, no replication, no sharding)

In Compass with collation and without:


In Mongosh with and without collation:
image

I tried this with mongosh and I cannot reproduce it so maybe it’s related to data in your collection? Does this happen when you run this explain on a different collection?

This could be related to COMPASS-4944 and one way to check would be to try this in the legacy MongoDB shell (mongo rather than mongosh) as it’s not based on the Node.js driver and might handle invalid UTF-8 differently (showing where the error is).

I suspect that collation is maybe a red herring (as in the invalid document is skipped with simple collation and matched with specified collation so it’s only encountered with one and not the other). Do you get this error at all without explain?

Asya

Thanks again Asya!

It works in the legacy shell with and without collation so it must be related to mongosh or Compass even though the ticket status is resolved (and I’m using the latest version of Compass).

I’ve played around with the options and the query crashes in mongosh and Compass when numericOrdering is a part of the collation. All of the other collation options work without error.

I took your suggestion and tried it with other collections and didn’t receive an error using collation with numericOrdering, so there must be an issue in my dataset.

Now I know I can refocus my search to trying to figure out how to query for invalid UTF-8 strings in my dataset. :smiley:

1 Like