Best way to batch queries

Alex_Coleman · January 20, 2022, 1:58am

What is the best way to batch multiple queries into a single request? My goal is to avoid incurring the multiplied latency from serial requests, but also avoid opening too many connections for parallel requests. Ideally I would be able to say db.bulkRead([query1, query2, query3]) and get back a (potentially paginated) list of results.

I’m also curious if the driver automatically does something like this (i.e. if I send two requests with maxPoolSize=1, will the driver wait for the first request to fully complete before issuing the second request? Or could it send both across a single connection, and just get the response streamed in after?). I would prefer a system where I could choose what to batch to ensure large queries aren’t mixed into batches of cheap queries.

From my research so far, it seems like this exists but only with bulkWrites, and I’m not sure why it wouldn’t exist for bulkRead as well.

Pavel_Duchovny · January 20, 2022, 6:25am

Hi @Alex_Coleman ,

Making the poolsize to one will not increase on parallel execution of queries. On the other hand it will harm it.

The best way to batch data fetch in MongoDB is to store it embedded in documents. Meaning that if you need to always fetch related data together store it together in a single document. If not possible store it in a chain of documents in a single collection with a predict that you can identify for example an id for a batch grab indexed.

Other less recommended technics are using a unionWith aggregation or a $facet search with multiple facets as batches or $lookup to access different collections in a single query.

I recommend reading some of our blogs on data design

Thanks
Pavel

Alex_Coleman · January 20, 2022, 6:07pm

Hi Pavel,

Thanks for the answer. I understand that a higher poolsize is best for parallel execution of queries, but I was just curious the exact effect of using a single connection. In particular, if execution was serial or just the writing of query/response to the DB was serial (which would be less of an issue since it doesn’t incur latency cost twice).

I agree its best to store related data together in similar documents and such. However, across a large application issuing many queries it would be nice if there was a way to batch unrelated queries. Facet seems like exactly the answer I’m looking for, but it doesn’t support indexes on the contents. Do you know if there’s any way to do something similar? Basically I just want a {$or: [$query1, $query2]}, but with the results of each query labeled. Thanks!

Pavel_Duchovny · January 21, 2022, 9:24am

Hi @Alex_Coleman ,

With facets you can achieve several answers each is labled at another field.

You can run both and than use a $project to return only one based on a condition.

Be cautious that since data is returned in a single document for all queries it cannot cross 16MB of returning data so use limit if necessary to fetch just a portion of the data.

Otherwise, you can use your clients to run many queries in parallel from the driver stand point. I used promise arrays to run it via node js.

Thanks
Pavel