Sorting on multiple fields that may or may not be Arrays (“cannot sort with keys that are parallel arrays”)

I have an issue with indexing and sorting of fields with unknown type in my use case. This means they may be an Array in one document, but the same field in another document could be something else.

Now, you may know Mongo doesn’t allow indexing or sorting on multiple Array fields it gives the following error if you try : “cannot sort with keys that are parallel arrays”

As an exemple to explain this post I’ll use the following index : {a:1,b:1}
Keep in mind field a and field b can be of any type in this exemple (Object, Array, number, string, Date etc…) and the type can be different between documents.

In this exemple if I try to insert the following document it works :
{a:1, b:2}
{a:[1,2,3],b:"hi"}

But if I try to insert a document like this :
{a:[1,2,3], b:["a","b","c"]}

I get an error.

I don’t know what to do to fix this issue, and this restriction makes no sense.
https://jira.mongodb.org/browse/SERVER-826

I absolutely need in my use case to be able to sort on arbitrary fields, and I’ve yet to come up with a workaround that makes sense and won’t impact performance too much.

The workaround I came up with :

  1. Create a copy of all fields for sorting purposes where Arrays are converted to strings or another format (Obviously this is very bad because it duplicates all data)
    1.exemple : {a:1, a_sort:1, b:[1,2,3], b_sort:"[1,2,3]"} and the index : {a_sort:1, b_sort:1}

  2. Replace the $sort aggregation stage with a $function aggregation performing a custom sort, this is also very bad because we lose all benefits from indexes and the optimization on the $limit stage (limit stage optimization can be somewhat emulated in the function by returning immediatly false in the filter, but it won’t be as efficient as the mongo one)

Are there other workarounds I haven’t though of ? Will the Mongo Team ever give us an option to bypass this limitation since from what I can tell it’s only meant to stupid proof against very inefficient sorts ? But in my case my workarounds would certainly result in worse performances than what I could get from a regular sort.

Any help is appreciated

I’m still looking for better workarounds, I’m not satified with the workarounds I came up with, I feel like the trade-offs are too important.

I did came up with another potential solution :

To never store an array and always store an object with keys “entry_0”, “entry_1”, “entry_2” and convert into an array in application code, also a very bad workaround because we lose access to $push, $pull, $in and other operators that work on arrays and it adds work to the application to convert to and from arrays.

I hope I can find a better workaround…

I came up with a better version of this workaround :
Never store arrays, but store an object {array:[ ]} essentially forcing mongo to treat it as an object instead of an array. This is less tedious than the previous option, because we can still do $push, $pull etc… on the inner array. This also means a lot of changes to the application code in order to extract the array from the object but is much better than the previous option.

I really shouldn’t need to do that kind of things

Hello I have the same issue, needing to sort and multiple sort on arbitrary fields, and some of them are array fields. I have implemented a solution like the original poster suggested, and it seemed to work well to solve the sorting issue. It actually seems like the sort is functioning correctly, even though the array is now embedded in another field. Can anyone explain why this works?

Finally I added an @addFields stage after my $sort and $limit stages, in order to cast the arrays back to their original position in the projection object, which means I did not have to make any updates to application code. I’m curious if there are any drawbacks to this approach that I should consider.

Here is an example of my last 3 aggregation stages:

{ 
  $project: {
    array_field_1: {
      _ids: "$array_field_1"
    },
    array_field_2: {
      _ids: "$array_field_2"
    }
  }
},
{
  $sort: {
    array_field_1: 1,
    array_field_2: -1,
  }
},
{
  $addFields: {
    array_field_1: "$array_field_1._ids",
    array_field_2: "$array_field_2._ids"
  }
}  

The reason this works is because the limit on sorting on Array fields is entirely arbitrary and Mongo sort object fields like this :
MongoDB’s comparison of BSON objects uses the following order: (https://www.mongodb.com/docs/manual/reference/bson-type-comparison-order/#std-label-bson-types-comparison-order)

  1. Recursively compare key-value pairs in the order that they appear within the BSON object.
  2. Compare the field types. MongoDB uses the following comparison order for field types, from lowest to highest:
    1. MinKey (internal type)
    1. Null
    1. Numbers (ints, longs, doubles, decimals)
    1. Symbol, String
    1. Object
    1. Array
    1. BinData
    1. ObjectId
    1. Boolean
    1. Date
    1. Timestamp
    1. Regular Expression
    1. MaxKey (internal type)
  1. If the field types are equal, compare the key field names.
  2. If the key field names are equal, compare the field values.
  3. If the field values are equal, compare the next key/value pair (return to step 1). An object without further pairs is less than an object with further pairs.

This means that has long as the objects being compared have the same keys, in this case a single array field, Mongo will compare the Arrays.

This ends up being slower than if you could compare Arrays directly, but bypasses the arbitrary limitation on sorting on Array fields.

The reason I suspects Mongo decided to not allow sorting on multiple Array fields even if it’s clearly possible is to prevent inefficient sorts on large arrays since it has the potential to take exponentially more time to sort.

The aggregation you posted won’t use indexes for sorting, but it might not be a problem for you depending on the amount of data that needs to be sorted.

For my use case, I ended up making a duplicate field only used for sorting where I convert my arrays to a string representation.

2 Likes