$project performance drop since v4.4

Hi,

My app is using this kind of approach to pick some parts of a (sometimes very large) document (here I’m picking the 30 first among possibly thousands):

db.variantRunData.aggregate([
	{
        "$project": {"g0" : [
                          "$sp.1.gt","$sp.2.gt","$sp.3.gt","$sp.4.gt","$sp.5.gt","$sp.6.gt","$sp.7.gt","$sp.8.gt","$sp.9.gt","$sp.10.gt"
                          ,"$sp.11.gt","$sp.12.gt","$sp.13.gt","$sp.14.gt","$sp.15.gt","$sp.16.gt","$sp.17.gt","$sp.18.gt","$sp.19.gt","$sp.20.gt"
                          ,"$sp.21.gt","$sp.22.gt","$sp.23.gt","$sp.24.gt","$sp.25.gt","$sp.26.gt","$sp.27.gt","$sp.28.gt","$sp.29.gt","$sp.30.gt"
                        ]}
	}
	, { "$count": "count" }
]);

The real query can actually be much longer than that but it was identified that this first bit (creation of a table from a dynamic list of indices) takes almost twice as much time to execute in any version from 4.4 than in 4.2. I used the explain() method to investigate the problem and found that on 4.4+ the winningPlan stages are PROJECTION_DEFAULT then FETCH then IXSCAN whereas on 4.2 it goes straight to FETCH AND IXSCAN. A certified MongoDB consultant told me that 4.4+ is doing more work when looking up each field in the sp object, whereas in 4.2, the find() layer is pulling out all fields in one pass over the BSON object, and then storing each in a cached Document. I submitted this problem to him 3 years ago and an internal ticket was filed at that time.

I left this problem aside since then, sticking to v4.2 and hoping this regression would we solved some day. But I just tested v5, v6, v7, and the performance problem is still there. I really can’t stick to v4.2 forever as it is already causing libssl-related issues when installing it on recent Linux distributions.

Using $getField for each field does not help. Maybe if it supported dynamic field names I could try to nest it in a $map but it can’t be done at the moment.

My questions are:

  • Any chance I can get things to work like before? I doubt that changes decided over 3 years ago could be reverted just for me… But perhaps there is a configuration parameter that I don’t know of, and that makes MongoDB behave as it used to?

  • Do you have a suggestion on how to re-organize my data so it wouldn’t take this much time to pick those sub-items and create that table?

I’m really stuck, hope someone can help. Thanks in advance,

Guilhem

Hey @Guilhem_SEMPERE,

Welcome back to the MongoDB Community!

Could you please provide me with sample documents and the deployment configuration details of your MongoDB server? Also, I’m curious to know more about your testing workflow for different versions of MongoDB. Could you share the workflow you used and the sample sizes you tested with?

Here, the PROJECTION_DEFAULT serves as a “default” projection, triggered when no explicit projection is requested. However, could you please assist me in better understanding by providing the explain() output of the query from both versions or the version you have tested?

Please also provide the expected output result with the sample data above requested.

Looking forward to your response.

Best regards,
Kushagra

Hi Kushagra. Let me provide you means to test this on your side.

Here is a script that generates sample data:

var sampleGtArray = [
	{"gt":"0/1", "ai":{"AD":"2,1", "DP":NumberInt(3), "PL":"34,0,80"}},
	{"gt":"0/0", "ai":{"AD":"1,0", "DP":NumberInt(1), "PL":"0,3,41"}},
	{"gt":"1/1", "ai":{"AD":"0,1", "DP":NumberInt(1), "PL":"44,3,0"}},
	{"gt":"0/0", "ai":{"AD":"2,0", "DP":NumberInt(2), "PL":"0,6,89"}}
];

var sampleVrdRecord = {
  	"sp": {},
    "ty": "SNP", 
    "rp": { "ch" : "1"}, 
    "ka": ["T", "C"], 
    "ai": {"AC":NumberInt(20),"MQ":NaN,"AF":0.385,"InbreedingCoeff":-0.0984,"MLEAC":NumberInt(23),"ExcessHet":0.3087,"MLEAF":0.442,"DP":NumberInt(56),"AN":NumberInt(52),"FS":0.0,"QD":31.82,"SOR":0.488,"qual":854.31}, 
    "_class" : "R"
}

for (var i=0; i<300; i++) {
  var docsToInsert = [];
  for (var k=0; k<300; k++) {
    var docToInsert = JSON.parse(JSON.stringify(sampleVrdRecord));
    for (var j=0; j<300; j++) {
      	var rndPos = Math.floor(Math.random() * 30000000) + 1;
	    docToInsert.rp.ss = NumberLong(rndPos);
        docToInsert.sp[(j+1) + ""] = sampleGtArray[rndPos % sampleGtArray.length];
    }
    docsToInsert.push(docToInsert);
  }
  db.vrd.insert(docsToInsert);
}

and below a query to run on it:

db.vrd.aggregate([
{
        "$project": {"g0" : [
                          "$sp.1.gt","$sp.2.gt","$sp.3.gt","$sp.4.gt","$sp.5.gt","$sp.6.gt","$sp.7.gt","$sp.8.gt","$sp.9.gt","$sp.10.gt"
                          ,"$sp.11.gt","$sp.12.gt","$sp.13.gt","$sp.14.gt","$sp.15.gt","$sp.16.gt","$sp.17.gt","$sp.18.gt","$sp.19.gt","$sp.20.gt"
                          ,"$sp.21.gt","$sp.22.gt","$sp.23.gt","$sp.24.gt","$sp.25.gt","$sp.26.gt","$sp.27.gt","$sp.28.gt","$sp.29.gt","$sp.30.gt"
                          ,"$sp.31.gt","$sp.32.gt","$sp.33.gt","$sp.34.gt","$sp.35.gt","$sp.36.gt","$sp.37.gt","$sp.38.gt","$sp.39.gt","$sp.40.gt"
                          ,"$sp.41.gt","$sp.42.gt","$sp.43.gt","$sp.44.gt","$sp.45.gt","$sp.46.gt","$sp.47.gt","$sp.48.gt","$sp.49.gt","$sp.50.gt"
                          ,"$sp.51.gt","$sp.52.gt","$sp.53.gt","$sp.54.gt","$sp.55.gt","$sp.56.gt","$sp.57.gt","$sp.58.gt","$sp.59.gt","$sp.60.gt"
                          ,"$sp.61.gt","$sp.62.gt","$sp.63.gt","$sp.64.gt","$sp.65.gt","$sp.66.gt","$sp.67.gt","$sp.68.gt","$sp.69.gt","$sp.70.gt"
                          ,"$sp.71.gt","$sp.72.gt","$sp.73.gt","$sp.74.gt","$sp.75.gt","$sp.76.gt","$sp.77.gt","$sp.78.gt","$sp.79.gt","$sp.80.gt"
                          ,"$sp.81.gt","$sp.82.gt","$sp.83.gt","$sp.84.gt","$sp.85.gt","$sp.86.gt","$sp.87.gt","$sp.88.gt","$sp.89.gt","$sp.90.gt"
                          ,"$sp.91.gt","$sp.92.gt","$sp.93.gt","$sp.94.gt","$sp.95.gt","$sp.96.gt","$sp.97.gt","$sp.98.gt","$sp.99.gt","$sp.100.gt"
                          ,"$sp.101.gt","$sp.102.gt","$sp.103.gt","$sp.104.gt","$sp.105.gt","$sp.106.gt","$sp.107.gt","$sp.108.gt","$sp.109.gt","$sp.110.gt"
                          ,"$sp.111.gt","$sp.112.gt","$sp.113.gt","$sp.114.gt","$sp.115.gt","$sp.116.gt","$sp.117.gt","$sp.118.gt","$sp.119.gt","$sp.120.gt"
                          ,"$sp.121.gt","$sp.122.gt","$sp.123.gt","$sp.124.gt","$sp.125.gt","$sp.126.gt","$sp.127.gt","$sp.128.gt","$sp.129.gt","$sp.130.gt"
                          ,"$sp.131.gt","$sp.132.gt","$sp.133.gt","$sp.134.gt","$sp.135.gt","$sp.136.gt","$sp.137.gt","$sp.138.gt","$sp.139.gt","$sp.140.gt"
                          ,"$sp.141.gt","$sp.142.gt","$sp.143.gt","$sp.144.gt","$sp.145.gt","$sp.146.gt","$sp.147.gt","$sp.148.gt","$sp.149.gt","$sp.150.gt"
                          ,"$sp.151.gt","$sp.152.gt","$sp.153.gt","$sp.154.gt","$sp.155.gt","$sp.156.gt","$sp.157.gt","$sp.158.gt","$sp.159.gt","$sp.160.gt"
                          ,"$sp.161.gt","$sp.162.gt","$sp.163.gt","$sp.164.gt","$sp.165.gt","$sp.166.gt","$sp.167.gt","$sp.168.gt","$sp.169.gt","$sp.170.gt"
                          ,"$sp.171.gt","$sp.172.gt","$sp.173.gt","$sp.174.gt","$sp.175.gt","$sp.176.gt","$sp.177.gt","$sp.178.gt","$sp.179.gt","$sp.180.gt"
                          ,"$sp.181.gt","$sp.182.gt","$sp.183.gt","$sp.184.gt","$sp.185.gt","$sp.186.gt","$sp.187.gt","$sp.188.gt","$sp.189.gt","$sp.190.gt"
                          ,"$sp.191.gt","$sp.192.gt","$sp.193.gt","$sp.194.gt","$sp.195.gt","$sp.196.gt","$sp.197.gt","$sp.198.gt","$sp.199.gt","$sp.200.gt"
                        ]}
    },
    {
        "$project": {
            "r": {
                "$let": {
                    "vars": { "gt0": "$g0" },
                    "in": {
                        "$let": {
                            "vars": {
                                "a0": {
                                    "$sum": {
                                        "$map": {
                                            "input": "$$gt0",
                                            "as": "g",
                                            "in": { "$add": [ 1, { "$cmp": ["$$g", "0/1"] } ] }
                                        }
                                    }
                                },
                                "m0": {
                                    "$subtract": [
                                        200,
                                        {
                                            "$sum": {
                                                "$map": {
                                                    "input": "$$gt0",
                                                    "as": "g",
                                                    "in": { "$max": [ 0, { "$cmp": ["$$g", null] } ] }
                                                }
                                            }
                                        }
                                    ]
                                }
                            },
                            "in": {
                                "f0": {
                                    "$divide": [
                                        { "$multiply": [ "$$a0", 50 ] },
                                        { "$cond": [ { "$eq": ["$$m0", 200] }, null, { "$subtract": [200, "$$m0"] } ] }
                                    ]
                                }
                            }
                        }
                    }
                }
            }
        }
    },
    {
        "$match": {
            "$and": [ {
                    "$or": [
                        { "$and": [ { "r.f0": {"$gte": 35 } }, { "r.f0": { "$lte": 40 } } ] },
                        { "$and": [ { "r.f0": {"$lte": 65 } }, { "r.f0": { "$gte": 60 } } ] }
                    ]
                } ]
        }
    },
    { "$count": "count" }
]);

On my computer, query is over 3x slower to run on v4.4.20 than on v4.2.24. From previous tests I made, I am pretty sure versions 5, 6, 7 behave the same as 4.2.

Here is the explain() output for v4.2.24

{ 
    "stages" : [
        {
            "$cursor" : {
                "query" : {

                }, 
                "fields" : {
                    "sp.1.gt" : 1.0, 
                    "sp.10.gt" : 1.0, 
                    (etc... truncating!)
                    "_id" : 1.0
                }, 
                "queryPlanner" : {
                    "plannerVersion" : 1.0, 
                    "namespace" : "mgdb2_AATTOL_rnaseq.vrd", 
                    "indexFilterSet" : false, 
                    "parsedQuery" : {

                    }, 
                    "queryHash" : "8B3D4AB8", 
                    "planCacheKey" : "8B3D4AB8", 
                    "winningPlan" : {
                        "stage" : "COLLSCAN", 
                        "direction" : "forward"
                    }, 
                    "rejectedPlans" : [

                    ]
                }
            }
        }, 
        {
            "$project" : {
                "_id" : true, 
                "g0" : [
                    "$sp.1.gt",
                    (etc... truncating!)
                ]
            }
        }, 
        {
            "$project" : {
                "_id" : true, 
                "r" : {
                    "$let" : {
                        "vars" : {
                            "gt0" : "$g0"
                        }, 
                        "in" : {
                            "$let" : {
                                "vars" : {
                                    "a0" : {
                                        "$sum" : [
                                            {
                                                "$map" : {
                                                    "input" : "$$gt0", 
                                                    "as" : "g", 
                                                    "in" : {
                                                        "$add" : [
                                                            {
                                                                "$cmp" : [
                                                                    "$$g", 
                                                                    {
                                                                        "$const" : "0/1"
                                                                    }
                                                                ]
                                                            }, 
                                                            {
                                                                "$const" : 1.0
                                                            }
                                                        ]
                                                    }
                                                }
                                            }
                                        ]
                                    }, 
                                    "m0" : {
                                        "$subtract" : [
                                            {
                                                "$const" : 200.0
                                            }, 
                                            {
                                                "$sum" : [
                                                    {
                                                        "$map" : {
                                                            "input" : "$$gt0", 
                                                            "as" : "g", 
                                                            "in" : {
                                                                "$max" : [
                                                                    {
                                                                        "$cmp" : [
                                                                            "$$g", 
                                                                            {
                                                                                "$const" : null
                                                                            }
                                                                        ]
                                                                    }, 
                                                                    {
                                                                        "$const" : 0.0
                                                                    }
                                                                ]
                                                            }
                                                        }
                                                    }
                                                ]
                                            }
                                        ]
                                    }
                                }, 
                                "in" : {
                                    "f0" : {
                                        "$divide" : [
                                            {
                                                "$multiply" : [
                                                    "$$a0", 
                                                    {
                                                        "$const" : 50.0
                                                    }
                                                ]
                                            }, 
                                            {
                                                "$cond" : [
                                                    {
                                                        "$eq" : [
                                                            "$$m0", 
                                                            {
                                                                "$const" : 200.0
                                                            }
                                                        ]
                                                    }, 
                                                    {
                                                        "$const" : null
                                                    }, 
                                                    {
                                                        "$subtract" : [
                                                            {
                                                                "$const" : 200.0
                                                            }, 
                                                            "$$m0"
                                                        ]
                                                    }
                                                ]
                                            }
                                        ]
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }, 
        {
            "$match" : {
                "$or" : [
                    {
                        "$and" : [
                            {
                                "r.f0" : {
                                    "$gte" : 35.0
                                }
                            }, 
                            {
                                "r.f0" : {
                                    "$lte" : 40.0
                                }
                            }
                        ]
                    }, 
                    {
                        "$and" : [
                            {
                                "r.f0" : {
                                    "$lte" : 65.0
                                }
                            }, 
                            {
                                "r.f0" : {
                                    "$gte" : 60.0
                                }
                            }
                        ]
                    }
                ]
            }
        }, 
        {
            "$group" : {
                "_id" : {
                    "$const" : null
                }, 
                "count" : {
                    "$sum" : {
                        "$const" : 1.0
                    }
                }
            }
        }, 
        {
            "$project" : {
                "_id" : false, 
                "count" : true
            }
        }
    ], 
    "serverInfo" : {
        "host" : "CIRT829", 
        "port" : 59393.0, 
        "version" : "4.2.24", 
        "gitVersion" : "5e4ec1d24431fcdd28b579a024c5c801b8cde4e2"
    }, 
    "ok" : 1.0
}

and for v4.4.20

{ 
    "stages" : [
        {
            "$cursor" : {
                "queryPlanner" : {
                    "plannerVersion" : 1.0, 
                    "namespace" : "mgdb2_AATTOL_rnaseq.vrd", 
                    "indexFilterSet" : false, 
                    "parsedQuery" : {

                    }, 
                    "queryHash" : "9B03E893", 
                    "planCacheKey" : "9B03E893", 
                    "winningPlan" : {
                        "stage" : "PROJECTION_DEFAULT", 
                        "transformBy" : {
                            "_id" : true, 
                            "g0" : [
                                "$sp.1.gt",
                                (etc... truncating!)
                            ]
                        }, 
                        "inputStage" : {
                            "stage" : "COLLSCAN", 
                            "direction" : "forward"
                        }
                    }, 
                    "rejectedPlans" : [

                    ]
                }
            }
        }, 
        {
            "$project" : {
                "_id" : true, 
                "r" : {
                    "$let" : {
                        "vars" : {
                            "gt0" : "$g0"
                        }, 
                        "in" : {
                            "$let" : {
                                "vars" : {
                                    "a0" : {
                                        "$sum" : [
                                            {
                                                "$map" : {
                                                    "input" : "$$gt0", 
                                                    "as" : "g", 
                                                    "in" : {
                                                        "$add" : [
                                                            {
                                                                "$cmp" : [
                                                                    "$$g", 
                                                                    {
                                                                        "$const" : "0/1"
                                                                    }
                                                                ]
                                                            }, 
                                                            {
                                                                "$const" : 1.0
                                                            }
                                                        ]
                                                    }
                                                }
                                            }
                                        ]
                                    }, 
                                    "m0" : {
                                        "$subtract" : [
                                            {
                                                "$const" : 200.0
                                            }, 
                                            {
                                                "$sum" : [
                                                    {
                                                        "$map" : {
                                                            "input" : "$$gt0", 
                                                            "as" : "g", 
                                                            "in" : {
                                                                "$max" : [
                                                                    {
                                                                        "$cmp" : [
                                                                            "$$g", 
                                                                            {
                                                                                "$const" : null
                                                                            }
                                                                        ]
                                                                    }, 
                                                                    {
                                                                        "$const" : 0.0
                                                                    }
                                                                ]
                                                            }
                                                        }
                                                    }
                                                ]
                                            }
                                        ]
                                    }
                                }, 
                                "in" : {
                                    "f0" : {
                                        "$divide" : [
                                            {
                                                "$multiply" : [
                                                    "$$a0", 
                                                    {
                                                        "$const" : 50.0
                                                    }
                                                ]
                                            }, 
                                            {
                                                "$cond" : [
                                                    {
                                                        "$eq" : [
                                                            "$$m0", 
                                                            {
                                                                "$const" : 200.0
                                                            }
                                                        ]
                                                    }, 
                                                    {
                                                        "$const" : null
                                                    }, 
                                                    {
                                                        "$subtract" : [
                                                            {
                                                                "$const" : 200.0
                                                            }, 
                                                            "$$m0"
                                                        ]
                                                    }
                                                ]
                                            }
                                        ]
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }, 
        {
            "$match" : {
                "$or" : [
                    {
                        "$and" : [
                            {
                                "r.f0" : {
                                    "$gte" : 35.0
                                }
                            }, 
                            {
                                "r.f0" : {
                                    "$lte" : 40.0
                                }
                            }
                        ]
                    }, 
                    {
                        "$and" : [
                            {
                                "r.f0" : {
                                    "$lte" : 65.0
                                }
                            }, 
                            {
                                "r.f0" : {
                                    "$gte" : 60.0
                                }
                            }
                        ]
                    }
                ]
            }
        }, 
        {
            "$group" : {
                "_id" : {
                    "$const" : null
                }, 
                "count" : {
                    "$sum" : {
                        "$const" : 1.0
                    }
                }
            }
        }, 
        {
            "$project" : {
                "count" : true, 
                "_id" : false
            }
        }
    ], 
    "serverInfo" : {
        "host" : "CIRT829", 
        "port" : 59393.0, 
        "version" : "4.4.20", 
        "gitVersion" : "5b82316e5017ff76646fd12878079620f9ad4be7"
    }, 
    "ok" : 1.0
}

I forgot to specify that simply switching “sp” field type to array and working with indices instead of keys is not a solution since in a real DB the keys contained in that field are not always the same (some documents may contain a given range, some others another range, and so on…)