What is the best way to run data analytics when using Python Motor (Async Driver)?

Hey folks,

I am very new to MongoDB. My use case is to have multiple collections, each containing multiple documents. I will have numerous collections and numerous documents (1000+). What is the best way to run analytics at scale? The options I see so far are as follows:

  1. Use Pipelines in Mongo
  2. Use PyMongoArrow (but not sure how that interfaces with Motor)

Am I missing any options? Also, which is most recommended for best practices and scalability. If it is 2, how can I make it work with Motor?

Thank you.

Hello, we do not yet support async clients in PyMongoArrow, this feature is tracked in https://jira.mongodb.org/browse/ARROW-198.

Noted. What is the best option to run analytics across multiple collections and documents?

Using pipelines in MongoDB, as you suggested.

Got it. As a follow up, when should someone be using PyMongoArrow for data analytics / transformations versus Mongo’s pipelines? Is it only if the input is stored as Pandas/Numpy/Apache Arrow or is there another reason to use PyMongoArrow (efficiency / additional capabilities etc.)? Thanks

We have a comparison page.

In general, PyMongoArrow is faster and uses less memory when dealing with larger, un-nested documents. We will continue to improve PyMongoArrow over time.