$lookup + $sort + index

Hi,
I was able to follow https://docs.mongodb.com/v5.0/reference/operator/aggregation/lookup/#examples to create a lookup aggregation between 2 collections. (MongoDB 4.4)
However, when I try to add $sort, for larger datasets like 100K documents, the query times out and no results are returned. If the dataset is small, like 50 documents, it’s fine.

Any suggestions? I followed ESR rule and setup the following:

Collection 1:  source_products
 - source_id
 - gid
 - title
 
index({source_id: 1, gid: 1}, {unique: true})
index({source_id: 1, title: 1`})

Collection 2:  destination_products
 - target_gid

Here’s the aggregation, the explain plan shows that source_id_1_gid_1 is being used, so it seems that the sort on title isn’t using the correct index.

db.source_products.aggregate(
          [
            {
              '$lookup' => {
                from: 'destination_products',
                localField: 'gid',
                foreignField: 'target_gid',
                as: 'destination_product'
              }
            },
            {
              '$sort' => {
                title: 1
              }
            }
          ]
        )

I also tried these indices and they aren’t being used either:

index({source_id: 1, gid: 1, title: 1})
index({source_id: 1, title: 1})

Hello @netwire,

There are few things you need to understand,

How compound index works?

MongoDB can use the index to support queries on the index prefixes. As such, MongoDB can use the index for queries on the following fields in $match stage/query part:

  • Case 1:

    index({source_id: 1, title: 1})

    • the source_id field,
      { "source_id": "123" }
    • the source_id field and the title field,
      { "source_id": "123", "title": "search name" }
  • Case 2:

    index({source_id: 1, gid: 1, title: 1})

    • the source_id field,
      { "source_id": "123" }
    • the source_id field and the gid field,
      { "source_id": "123", "gid": "123" }
    • the source_id field and the gid field and the title field.
      { "source_id": "123", "gid": "123", "title": "search name" }
    • the source_id field and and the title field.
      { "source_id": "123", "title": "search name" }

Note: If you place a $match at the very beginning of a pipeline, the query can take advantage of indexes like any other db.collection.find() or db.collection.findOne() .

Use index in $sort operation:

The $sort operator can take advantage of an index if it’s used in the first stage of a pipeline or if it’s only preceeded by a $match stage.

1 Like

In a nutshell, your $lookup doesn’t affect the title field so you should be sorting first and that would be able to use any index that starts with title (or just a single index on title).

Try adding title:1 index to source_products and switch the two agg stages to have $sort first and then $lookup.

Asya

Thanks @Asya_Kamsky, @turivishal ,

I totally forgot the importance of ordering in MongoDB stages. I have since changed the aggregation to the following and added:

  1. search for title via regex, but query can also be null which should show all docs
  2. pagination, both forwards & backwards
  3. index {source_id: 1, title: 1, _id: 1}

I think forward pagination is now working. However, when I try to page backwards, instead of going back 50 documents, it goes back to the start of the collection.

  • Am I doing something wrong? For example, should the $match be on title & _id?
  • Does this work in cases if title isn’t unique?
  • Do I need to specify -1 for sort order for either title or _id?

P.S. The reason why I have 'title' => {'$exists' => true}, is because I need to support paging with

  1. search for specific titles via regex 'title' => /#{query}/i , as well as
  2. search all titles via 'title' => {'$exists' => true}
db.source_products.aggregate(
[
    {
      '$match' => {
        'source_id' => source.id,
        'title' => {'$exists' => true},
        '_id' => {'$lt' => BSON::ObjectId(before_cursor)}
      }
    },
    {
      '$sort' => {
        title: 1,
        _id: -1
      }
    },
    {
      '$limit' => 50
    },
    {
      '$lookup' => {
        from: 'destination_products',
        localField: 'gid',
        foreignField: 'target_gid',
        as: 'destination_product'
      }
    }
  ]
) 

when I try to page backwards, instead of going back 50 documents, it goes back to the start of the collection

I’m not sure what you are trying to implement - are you talking about the case where the user clicks next page (you fetch next 50 documents greater than last one displayed) and now they click previous page and you need to go back to previous 50? I can’t think of a simple way to do that reliably without keeping "before_cursor" value for previous page(s). I’m sure there is one but they all seem a bit awkward, or involve two queries…

Maybe someone else can think of a simple way to do it in a single query.

Asya