Mongodb Data Extraction takes too long time on collection of size more than 30Ms

Mehul_Sanghvi · January 30, 2024, 8:13pm

I want to extract documents from my collection collection_name. The size of size collection is increasing day by day and currently its nearly consists of 33 million documents.

Basically, my task is to mine data from this collection do some preprocessing and then perform some analysis based on that preprocessed data.

This preprocessing doesn’t takes much time, but while extracting the specific data using following find query it takes huge time.
To overcome this issue, I have start using $skip & $limit stages with for loop ( this loop helps me to change the value of skip & limit ), that could help me to extract this data points in batch of 5 lakhs.

Data Extraction Query:

db.getCollections("collection_name").find(
{},
{
   "name": 1, "phone": 1, "mobile": 1, "fax": 1, "contact number": 1, "state": 1,
}
).sort({_id: -1}).skip(0).limit(500000);

But this also consumes much time ( Nearly 3 hours to run through every iterations & fetch 33M data points) and also loads on my database. Is there any way to fasten up my data extraction query?

Shane · February 1, 2024, 10:46pm

What part is taking a long time, the query, the preprocessing? It’s impossible for us to tell without seeing more code. That said, a simple find({}) corresponds to a collection scan which should be very fast.

If the problem is that the query is slow, then adding batching with skip/limit/ sort({_id: -1}) could make the problem worse.

Could you include the full code (redacting the private/sensitive parts)?

Mehul_Sanghvi · February 5, 2024, 7:36am

Data extraction part is taking time. The query for the same I have already provided.