Indexed query performing poorly with large documents

Hi guys,

Recently I encountered an issue for which I haven’t found an explanation as yet. I have a collection with a simple one property index. The collection is not huge, only few thousands of documents. The issue I encountered is that when some of the documents grow in size, I’m assuming we are close to the document size limit, the index starts performing very poorly. When I use the explain() cursor method with executionStats mode I can see that the winning plan is opting for the index and the result is pretty fast as expected. However this is not the case when I run the query in mongoshell or some other DB tool. When doing so, the index is either not being used or performs very poorly. On the other hand when I reduce the documents size to a decent size the query returns efficiently. Can someone explain why this is so please? To my knowledge (or at least in other databases) an index is pointer based so it should perform the same irrespective of document size. Any insight would be highly appreciated. Thank you.

Andre

From the little information you supplied about your configuration, my first guess would be that you do not have enough RAM and your big documents needs to be read from disk.

Well I doubt that is the cause. It’s a dedicated server with 16gb of ram. The collection is not huge, I managed to reproduce the problem with a collection of just 257 documents for example. The average object size as reported by mongodb being 59541.

Welcome to the MongoDB Community Forums @Andre_Cassar!

Were you able to find a solution for your query performance?

If not, can you provide some more details on your query and environment including:

  • specific version of MongoDB server (eg as reported by db.version() in the mongo shell)
  • output of explain(allPlansExecution) for the query that is performing poorly
  • where your mongo shell or DB tool is running relative to the MongoDB server (eg connecting remotely or via localhost).
  • value of mongod's storage.wiredTiger.engineConfig.cacheSizeGB if not set to the default

Regards,
Stennie

Hi Stennie,

Thanks for the welcome and feedback. I haven’t fully resolved it, however coming to think of it, it must be a network latency issue. Given that the document size is big, it is taking some time to transfer it, over 2 seconds. I did not take that into consideration initially. My database server and application server reside in the same data centre. There is a 100Mbps+ bandwidth between the two, yet the result take some seconds to be returned to the application. As a solution we are now working on reducing the size of the document by moving out some data into a separate collection. Initially I was baffled as it was not making sense at all when having an index.

Thanks again.
Regards,
Andre

Hi Andre,

If you suspect network transfer time is the main issue, I would try projecting fields to reduce the result set size and confirm before restructuring your data.

You may find the Schema Design Anti-Patterns series a helpful read. One of the anti-patterns described is bloated documents: storing large amounts of data together in a document when that data is not frequently accessed together also impacts the size of your working set in RAM.

It would also be worth reviewing allPlansExecution for your query (if you haven’t already), as there may be multiple candidate indexes without a consistent winner for your query shape. This is perhaps unlikely given the apparent correlation of document size to your performance concern, but you could verify to remove this possibility from consideration.

Regards,
Stennie