Is querying a collection based on _id faster than other field with b-tree jndex

Deepak_Rai · April 3, 2022, 5:44pm

Hi all,

I am pretty new to Mongodb and to this forum. I need some help in understanding behavior of Mongodb query performance in case of search using $in operator.
So in case 1 I am passing an array of possible _id values to search matching documents in Mongodb. The array can contain 100+ to 1000 elements.
In case 2 I am trying to filter on another unique field called OrgId by passing an array of possible OrgId values having 100+ to 1000 elements.

Case1:
A.find( { _id : { $in:[ObjectId(xx), ObjectId(xy)…] } } )

Case 2:

A.find( { OrgId : { $in:[“123”,“234”, …] } } )

I observed that in case 2 the response takes 5-10 seconds while in case 1 response comes back in 1-2 secs even though OrgId field also has a unique index created on it. Both cases number of array elements passed is 100.
I did not understand reason behind this. Can anyone help explain this.

PS: Both _id and OrgId fields will point to same set of documents in the query.

-Deepak

Pavel_Duchovny · April 4, 2022, 5:22am

Hi @Deepak_Rai ,

The _id is indeed a little bit optimised for cheery picking as it is used for replication as well. This index will usually maintain a better natural insert order if generated with auto ObjectId

In order to better understand the difference we need to compare execution plans with executionStats enabled.

Now what I can say is that a 100 $in operations is not that good as they are translated to range and not equality index operations …

Is there a way to change your schema and have a more selective query like parentOrg : 1 which will have all associated orgs under it… Why do you need all 100 elements in one query?

Thanks

Deepak_Rai · April 5, 2022, 11:21am

Hey, Thanks for responding. I will explain the background so actually we have a requirement to do fuzzy search on OrgName field and after multiple discussions with Mongodb product team it was suggested that for our use case Mongodb does not efficiently query (partial, fuzzy) the db to return a fast response and they suggested us to use the elasticsearch layer instead on top of Mongodb. In elasticsearch we store the searchable attributes (orgId, OrgName, countryCode) and when we search based on OrgName in ES we return an array of OrgIds on the basis of which we need to do another search in Mongodb to return the other attributes required by user. So for this case we needed to pass and array of OrgIds to the query that I explained in my original post and wanted to understand a better solution.

And regarding schema design, we already have the schema design freezed which currently suits lot of other use cases so thereby any change in schema would be the last resort for us.
Also we are using Mongodb 4.2 native and not the Atlas , as it is currently not part of our Orgs tech stack.
I hope I am able to clarify my requirement

Deepak_Rai · April 5, 2022, 11:23am

Also, the executionstats look similar for both cases with indexscan and number of docs analyzed are also quite close. Will share the explain plan shortly.

Pavel_Duchovny · April 5, 2022, 11:36am

Hi @Deepak_Rai

MongoDB Atlas search has great fuzzy text searching capabilities with optimised access pattern for text…

I would highly recommend considering Atlas and Atlas search for this scenario

This will solve the multiple stacked technology which actually complex your life…

Passing those large $in are not that great… Perhaps consider chunking them into smaller.(~10) per query and run multiple queries instead perhaps

Thanks
Pavel

Deepak_Rai · April 6, 2022, 12:24pm

Thanks Pavel for response… however we have a limitation that we cannot use MongoDB Atlas since it’s still not available for use in the organisation. I will try to check how searching with a smaller chunk of OrgIDs array will help.

system · April 11, 2022, 12:24pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.