Aggregate search filter for field being zero length array

zacharykane · April 1, 2022, 11:04pm

Hello mongo people!

I’m trying to use the full text search feature with my Atlas collection. That seems to be working great. I also want the results to be limited to documents where a certain field - that is an array - is empty/has a zero length. I’ve been looking over the docs for the aggregate search pipeline but nothing seems to handle this that I can find.

Previously, I was using the find() api like this:

collection.find({ tags: { $size: 0 } });

I feel like I should include this filter before the text search of other fields since it would greatly limit was needs… searched. What does that operation look like in an aggregate pipeline?

Right now, I’m text searching like this:

const cursor = collection.aggregate([
  {
    $search: {
      compound: {
        must: [
          {
            text: {
              query: 'style',
              path: ['title'],
            },
          },
        ],
      },
    },
  },
]);

This is getting my a cursor of the result of docs where the title field does contain style, so that’s great! I was thinking this could be a compound operation since I also want to filter out the tags array field to empty ones but I might be mistaken there too.

Any ideas on how to accomplish this query?

NeNaD · April 2, 2022, 8:35am

Hi,

You can add one Aggregation stage in the beginning of the Aggregation pipeline.

$match with $expr - to filter documents based on some expression
$eq with $size - to return only documents where tags array property is an empty array

db.collection.aggregate([
  {
    "$match": {
      "$expr": {
        "$eq": [
          0,
          {
            $size: "$tags"
          }
        ]
      }
    }
  }
])

Working example

steevej · April 2, 2022, 12:48pm

According to $search documentation $seach must be the first stage of the pipeline.

To test the size of an array I prefer to use:

{ "$match" : { "$tags.0" : { "$exists" : false } } }

The problem with $size is that it causes an error when the field tags does not exist with

MongoServerError: PlanExecutor error during aggregation :: caused by :: The argument to $size must be an array, but was of type: missing

It also causes an error when the field tags is not an array with:

MongoServerError: PlanExecutor error during aggregation :: caused by :: The argument to $size must be an array, but was of type: string

Since I am a big fan of the schema less nature of JSON and MongoDB, I prefer to use queries that accommodate that without error.

All this being written, I have to admit that the following variant of $size also works without error.

c.aggregate( { "$match" : { "tags" : { $size : 0 }} })

All this means that your filtering of empty tags array must be done after the $search.

However, something that I have not tried, would be to set a partial index with your tags size as a condition and then specify the index to use in your $search. Since the documents indexed will only be the one with $size:{$gt:0}, you kind of implicitly filter them out.

zacharykane · April 3, 2022, 6:18pm

According to $search documentation $seach must be the first stage of the pipeline.

Yes, exactly. It was this requirement that kind of baffled me. I thought match or equals might be the answer, but wasn’t sure how they were shaped.

Given that the text search has to be first, it does imply to me that all records will have their text fields inspected, and then whole swathes will be removed on the (in my mind) more simpler tags match. Is this not an inefficiency to mongo’s internals?

steevej · April 3, 2022, 9:01pm

That is not exactly how search and indexes work. If you do not have an index, then yes each record has to be verified. With search and indexes, it is much more efficient. More or less the index is inspected, not the documents. Finding a value inside an index is very efficient compared to scanning each document. The course M201 presents a good overview of indexes.

For example, if a document does not contain the indexed field, it takes almost no space in the index and is not examined when looking for normal value.