MongoDB - Slow aggregate query

It takes around 6min to get the result from MongoDB, when I use the following aggregate query.

    db.barcodes.aggregate([
        {
            $lookup: {
                from: 'company',
                localField: 'company',
                foreignField: '_id',
                as: 'company'
            }
        },
        {
            $match: {
                'company.name': 'ABCd'
            }
        }
    ]);

I have two collections in my DB, company and barcode. If I search with text ‘ABC’ (instead of ‘ABCd’ , company name ‘ABC’ already exists in the DB) it takes only 0.05Sec to complete the result.

Total 42,14,301 documents in barcode collection and 2 documents in company collection.

Sample documents

Company

    { 
        "_id" : ObjectId("615dd7873c4f710b71438772"), 
        "name" : "ABC", 
        "isActive" : true
    }

Barcode

    { 
        "_id" : ObjectId("615dd8ff3c4f710b71438773"), 
        "barcode" : "1", 
        "company" : ObjectId("615dd7873c4f710b71438772"), 
        "comment" : "text 1"
    }

Indexed fields

  • company._id
  • company.name
  • company.isActive
  • barcode.company

Mongo clients used: Studio 3t and MongoDB CLI

Output of explain

    { 
        "stages" : [
            {
                "$cursor" : {
                    "query" : {

                    }, 
                    "queryPlanner" : {
                        "plannerVersion" : 1.0, 
                        "namespace" : "diet.barcodes", 
                        "indexFilterSet" : false, 
                        "parsedQuery" : {

                        }, 
                        "winningPlan" : {
                            "stage" : "COLLSCAN", 
                            "direction" : "forward"
                        }, 
                        "rejectedPlans" : [

                        ]
                    }
                }
            }, 
            {
                "$lookup" : {
                    "from" : "company", 
                    "as" : "company", 
                    "localField" : "company", 
                    "foreignField" : "_id"
                }
            }, 
            {
                "$match" : {
                    "company.name" : {
                        "$eq" : "ABCd"
                    }
                }
            }
        ], 
        "ok" : 1.0
    }

The biggest issue is that your $match is after the $lookup.

Because you $lookup for all and then you about the result of a single company.

You should start your aggregation on the company collection by first $match the wanted company. This way you barcode $lookup only occurs for the wanted company. You will need an index on barcode for the compagny field.

1 Like

Another simple solution if the company name is unique: drop the useless ObjectId that represents the company.

Your collections would look like this:

Company

{
  "_id" : "ABC",
  "isActive" : true
}

Barcode

{
  "_id" : ObjectId("615dd8ff3c4f710b71438773"),
  "barcode" : "1",
  "company" : "ABC",
  "comment" : "text 1"
}

If the name was the only information from the company that you needed, then you don’t need the lookup anymore and your query will be fast with the index {company:1} in the barcodes collection.
Worst case scenario if you need some fields from the company document, you can now perform the $match first - which will be supported by the index - and then the $lookup on a restricted number of documents - which is better of course.

Another way to improve the speed of this query is to embed the company document in the barcodes if duplication is possible here. This would remove completely the need for a $lookup but would make the updates on company fields more complicated as they would need to be reported in all the barcodes associated to this company.

Another² solution would also be to embed in the barcode document only the company fields that will never change. Your document would look like:

Barcode

{
  "_id" : ObjectId("615dd8ff3c4f710b71438773"),
  "barcode" : "1",
  "company" : {
    "name": "ABC",
    "company_tax_code": 214365,
    "change": "never I will"
  },
  "comment" : "text 1"
}

Technically you could still update these values if you need to with an updateMany command (eventually secured by a Multi-Document ACID Transactions), but this must stay something exceptional.

Cheers,
Maxime.

3 Likes

Thank you Maxime. I think, last solution is suitable for me.

2 Likes