Continuously export data from mongodb

Python Program to export data continuously from MongoDB.

const collection = db.getCollection("collection_name");

const query = {
// Some query
}
  

const projection = {
    // To fetch only required fields.
};

# Define batch size and total documents
batch_size = 500000
total_documents = collection.count_documents(query)

# Iterate through the data with skip and limit
current_skip = 0

while current_skip < total_documents:
    result = collection.find(query, projection).sort([("_id", -1)]).skip(current_skip).limit(batch_size)
    
    for document in result:
        # Export the document or process it as needed
        print(document)

    current_skip += batch_size

Above code is python code that fetches the data from my MongoDB server, in batch of 5 Lakh(500K) records.
What I want is to know whether there is a way in MongoDB to run this kind of logic without switching to Python programming.

Hey @Mehul_Sanghvi,

May I ask if you are streaming the data from MongoDB to some other system? If not, please help me understand what you meant by “export data continuously”?

In case you are looking to continuously export data from MongoDB and stream it to another system, such as Apache Kafka, you can utilize MongoDB Connector for Apache Kafka that integrates MongoDB with Kafka.

If you prefer not to use any specific programming language, you can run the command directly from the mongo-shell or MongoDB Compass, where you can write a query using the aggregation pipeline as well.

Moreover, if you need additional assistance, please share your use case and related details. This information will help us to assist you better.

Regards,
Kushagra

@Kushagra_Kesav
I want to export data locally like we are doing in MongoDB Compass.
But my problem is that I have repeatedly kept changing the value of skip & limit. But in Python Programming, I can define a loop that can be executed easily, and I want that kind of automation in my MongoDB query too.

Hope you understand my query.

Hi @Mehul_Sanghvi,

I think in this scenario, using a programming language is much easier though. May I ask if you are facing any issues with such operations?

However, I would suggest not using skip() as it can be bad for the index. It’s preferable to use the _id field when it’s an ObjectId (sortable). Please refer to the Range Queries documentation to learn more.

Also, you can consider mongoexport if it aligns with your use case.

Regards,
Kushagra

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.