What is the most performant way to check an array field is not empty?

We’re having issues with a query not using any of our indexes. We have a users table, and each record has an array field called track_ids. We need to query all users who have track_ids (i.e. not a blank array).

The query we use is { track_ids: { $ne: [] } } which works fine. But it was performing a COLSCAN over 100,000 of users, taking up to 30s in the worst case. So we created an index on the track_ids column, but it is still doing a COLSCAN. Even when I use hint to specify the track_ids index, it uses the index, and then still proceeds to do a FETCH for the $ne condition.

So I looked into using partial indexes, and make the partial index condition the same as the query, but partial indexes do not support $ne conditions, so that approach didn’t pan out.

Other queries I’ve tried:

{ track_ids: { $not: { $size: 0 } } }

{ "track_ids.0": { $not: { $eq: null } } }

None off these use the index, and as a result, fallback to COLSCAN.

What is the best way to perform this type of query so that it uses an index?

Hey @Kieran_Pilkington,

Have you found any solution to this problem yet? If yes, it would be great if you can share the solution with the rest of the community too as it might help others facing a similar issue as yours. If not, kindly share the following for us to be better able to help you out and replicate this behavior at our end:

  • sample documents
  • the existing index
  • the queries that you have tried
  • output of explain(‘executionStats’)
  • the expected output

Also, I would like to point out one thing. The inequality operator $ne or $nin are not very selective since they often match a large portion of the index. As a result, in many cases, a $ne query with an index may perform no better than a $ne query that must scan all documents in a collection. You can read more about this behavior here: $ne

Another thing to note here is that the query { track_ids: { $ne: [] } } will return documents if the track_ids is not an empty array . Meaning if the field doesn’t exist, it will also return it. To confirm this, I created a sample collection with the following documents:

> db.test.find()
[
  { _id: 0, track_ids: [ 1, 2, 3 ] },
  { _id: 1, track_ids: [] },
  { _id: 2 }
]

we I used the query db.test.find({track_ids:{$ne:[]}}), I got the documents back:

[ 
  { _id: 0, track_ids: [ 1, 2, 3 ] },
  { _id: 2 }
]

I would suggest you that if you need to do this frequently, then have a separate field that can be indexed, e.g. track_ids_empty where it can be true or false . Update this field accordingly if you modify the document. This way, you can index that field and can efficiently get documents with empty track_ids

Hope this helps.

Regards,
Satyam