I want to extract documents from my collection collection_name
. The size of size collection is increasing day by day and currently its nearly consists of 33 million documents.
Basically, my task is to mine data from this collection do some preprocessing and then perform some analysis based on that preprocessed data.
This preprocessing doesn’t takes much time, but while extracting the specific data using following find query
it takes huge time.
To overcome this issue, I have start using $skip
& $limit
stages with for loop
( this loop helps me to change the value of skip
& limit
), that could help me to extract this data points in batch of 5 lakhs.
Data Extraction Query:
db.getCollections("collection_name").find(
{},
{
"name": 1, "phone": 1, "mobile": 1, "fax": 1, "contact number": 1, "state": 1,
}
).sort({_id: -1}).skip(0).limit(500000);
But this also consumes much time ( Nearly 3 hours to run through every iterations & fetch 33M data points) and also loads on my database. Is there any way to fasten up my data extraction query?