Pymongo Cursor to Dataframe taking long time

am using Mongodb Database to store data, and while I query data from the DB, it gives the output in ‘pymongo cursor’ type, which I need to convert to Dataframe. However, I have tried 2 methods, but they are taking 2-4 seconds to do it(even for as less as 1 record). Is there a way of faster conversion to dataframe?

Welcome to the MongoDB Community @Likith_sai !

What versions of Python and PyMongo are you using?

If it takes this long to retrieve a single document, I would suspect one or more of the following:

  • you are retrieving a document from a large collection using a query that isn’t properly supported by an index

  • your document is large or highly nested (perhaps spending significant time to convert into a data frame)

  • your measure of time includes application overhead that is unrelated to the database query processing time

Please share some more details about your environment:

  • how are you measuring 2-4 seconds time and what does that include (query time, time to fetch results over the network, time for your function to execute, etc) ?

  • explain('executionStats') output for the query you are running to fetch results

  • snippet of Python code showing how you are fetching and processing the results

  • average size (in bytes) and complexity (number of fields, levels of nesting) of the document you are fetching

  • type of MongoDB deployment are you connecting to (local or remote relative to your Python code; standalone, replica set, or sharded cluster)?

The MongoDB Python driver team maintains a PyMongoArrow extension for PyMongo which returns query result sets as Pandas’ DataFrame, Apache Arrow Table, and NumPy ndarray types.

This is definitely a recommended approach if your goal is to produce a result set in any of the supported data formats, but if you currently have performance issues fetching a single document there is probably tuning required elsewhere to improve your outcomes.

Regards,
Stennie

2 Likes

Hey @Likith_sai, the direction that @Stennie provided about PyMongoArrow is probably the way to go. But would you be able to share more information about what you’re doing with the data and what you’re looking to accomplish?

Feel free to throw some time on my calendar if it would be easier: Calendly - Benjamin Flast

1 Like