PyMongoArrow 0.6.2 Released

Steve_Silvester · November 16, 2022, 11:29pm

We are pleased to announce the 0.6.2 release of PyMongoArrow - a PyMongo extension containing tools for loading MongoDB query result sets as Apache Arrow tables, Pandas and NumPy arrays.

This is a minor release that brings support for PyArrow 10.0. We did not
publish 0.6.0 or 0.6.1 due to technical errors.

See the changelog for a high level summary of what’s new and improved or see the 0.6.2 release notes in JIRA for the complete list of resolved issues.

Documentation: PyMongoArrow 0.6.2 Documentation
Changelog: Changelog
Source: GitHub

Thank you to everyone who contributed to this release!

Sanjay_Dasgupta · November 18, 2022, 5:01am

I took the MongoDB university’s PyMongoArrow course yesterday, and then realised that support for many types is still not there.

On the other hand, the same functionality (with support for all Python types) is already provided by Pandas through one of its DataFrame constructors. The list of Python “dict” objects provided in the output of pymongo’s “find()” method (see Build A Python Database With MongoDB | MongoDB | MongoDB) can be directly given as input to the DataFrame constructor.

So, what is the need for, or advantage of, using PyMongoArrow?

Steve_Silvester · November 18, 2022, 10:23pm

Hi @Sanjay_Dasgupta, thank you for the question, and for opening Documentation should describe advantages over DataFrame constructor (of Pandas) · Issue #107 · mongodb-labs/mongo-arrow · GitHub.

For completeness, we’re tracking this issue in https://jira.mongodb.org/browse/ARROW-129, summarized as:

We should list the pros and cons of using this library versus using the PyMongo API directly, highlighting the benchmarks as well as the limitations.

We should give examples showing how the same tasks could be accomplished with each.

system · February 14, 2023, 11:29pm

This topic was automatically closed after 90 days. New replies are no longer allowed.