Improving MongoDB queries having $facet stage

In my query I have the first stage with $match to filter common conditions. And I want to perform a different set of conditions on the result set, hence using $facet. In the documents I saw $facet stage doesn’t use indexes. Probably because of this reason my query is very slow.

Here’s my data look like:

    PROPERTY_ID OWNER_NAME  ISFORSALE   ISFORECLOSURE   ISFORRENT   CITY
        100         AA        true          false          true       abc
        101         AA        true          false          false      xyz
        102         BB        true          true           true       abc
        103         BB        true          false          false      abc
        104         BB        false         false          true       abc
        105         BB        true          false          true       abc
        106         CC        true          false          false      xyz

I have my query like this, which is to get summarised counts:

    [{
      $match: {CITY: 'abc'}
    }, 
    {
      $facet: {
        SALE: [{$match: {ISFORSALE: true}},{$count: 'count'}],
        FORECLOSURE: [{$match: {ISFORECLOSURE: true}},{$count: 'count'}],
        RENT: [{$match: {ISFORRENT: true}},{$count: 'count'}],
        RENT_SALE: [{$match: {ISFORRENT: true, ISFORSALE:true}},{$count: 'count'}],
        MULTIOWER: [{$match: {ISFORSALE: true } }, {$group: {_id: '$OWNER_NAME', count: {$sum: 1 } } }, {$match: {count: {$gte: 2, $lte: 4 } } }, {$count: 'count'} ]
      }
    }, 
    {
      $project: {
        SALE_CNT: {
          $cond: {
            'if': {$gt: [{$size: '$SALE'},0]},then: {$first: '$SALE.count'},
            'else': 0
          }
        },
        FORECLOSURE_CNT: {
          $cond: {
            'if': {$gt: [{$size: '$FORECLOSURE'},0]},then: {$first: '$FORECLOSURE.count'},
            'else': 0
          }
        },
        RENT_CNT: {
          $cond: {
            'if': {$gt: [{$size: '$RENT'},0]},then: {$first: '$RENT.count'},
            'else': 0
          }
        },
        RENT_SALE_CNT: {
          $cond: {
            'if': {$gt: [{$size: '$RENT_SALE'},0]},then: {$first: '$RENT_SALE.count'},
            'else': 0
          }
        },
        MULTIOWER_CNT: {
          $cond: {
            'if': {$gt: [{$size: '$MULTIOWER'},0]},then: {$first: '$MULTIOWER.count'},
            'else': 0
          }
        }
      }
    }]

I’ve added index for the city, hence the match stages outputs the results very quickly.

But after reviewing the execution stats and some tests, I saw the $facet stage is taking too much time, and also it doesn’t use indexes. My DB has nearly 150 million documents.

Total # of docs returned by the match stage would be roughly 800,000.
And not all docs satisfy the conditions I’ve mentioned as shown in the example dataset.

Is there a way in Mongo to either skip the facet but a way to do the above query efficiently (using the indexes) or any other different way?

A few things.

  1. The way you do $facet looks like something that should be done with $group.
  2. Boolean fields like you have are, in my opinion, waste full in terms of space and make indexing and other things hard to do.
  3. I would use a different strategy to encode the 3 boolean fields. An array of integer (or string for readability) state could be use, for example, the array would contains 1 if for sale, 2 if for forclosure, and 3 for for rent. Your documents above would look like:
PROPERTY_ID OWNER_NAME  STATE     CITY
100         AA          [1,2]     abc
101         AA          [1]       xyz
102         BB          [1,2,3]   abc
...

With this you can use $group rather than $facet. An simple integer could also be used rather than an array where the first digit represent for sale, the 2nd fore closure.

It is a little bit less elegant to override a field like that but more efficient. And more flexible, it is easier to add a state using an array or the integer. With is_for_sale, is_for_rent columns you would need to add a new column to handle the new state. With the array it is just another value.

@steevej agree with your suggestion to use $group. Already did a test by changing the query to the form: $group: {_id: null, SALE: {$sum:}}. And the query is efficient than the $facet.

But still I couldn’t figure out a way to do the following query (5th subquery in the $facet):
MULTIOWER: [{$match: {ISFORSALE: true } }, {$group: {_id: '$OWNER_NAME', count: {$sum: 1 } } }, {$match: {count: {$gte: 2, $lte: 4 } } }, {$count: 'count'} ] }

Intention of this subquery is to find the #of owners having multiple properties for sale. What I know is $group cannot be added within another $group. What’s the best way to do that?

I am still thinking about it but have nothing to propose so far that can be done in a single DB access.

I tested the following method:

Still having the same $facet stage, but until before, I have a $group subprocess to handle
SALE, FORECLOSURE, RENT, RENT_SALE and the exiting subprocess for MULTIOWER. So it looks like:

$facet:{
{
  res1:[{  $group:    {
  _id: null,
  SALE: {$sum: { $arrayElemAt: [ "$STATE", 0 ] }},
  FORECLOSURE: {$sum: { $arrayElemAt: [ "$STATE", 1 ] }},
  RENT: {$sum: { $arrayElemAt: [ "$STATE", 2 ] }},
}}],
MULTIOWER: [{$match: {{ $arrayElemAt: [ "$STATE", 0 ] }: 1 } }, {$group: {_id: '$OWNER_NAME', count: {$sum: 1 } } }, {$match: {count: {$gte: 2, $lte: 4 } } }, {$count: 'count'} ]
}
}

Now it’s faster than the first query, but may not be the ideal

1 Like