PyMongoArrow 0.1.1 Released

Shubham_Ranjan · July 12, 2022, 4:12pm

You can use aggregation pipeline to export data of the nested fields out of MongoDB into any of the supported data formats.

For example, let’s say we want to export MongoDB data into pandas dataframe. We can use Pymongoarrow’s aggregate_pandas_all() function to achieve this.

Let’s say this is our sample document containing nested fields:

{'_id': ObjectId('62cd854a73939396fff10edd'), 'a': {'b': 1, 'c': 2}}

Using $project, we can rename the nested field and use the new names to define the Schema. For example:

schema = Schema({'ab': int, 'ac': int})
df = coll.aggregate_pandas_all([{'$project':{'ab':'$a.b', 'ac':'$a.c'}}], schema = schema)

We also have a ticket open (ARROW-9) for adding a direct support for this.

If you have any other questions/feedback related to PyMongoArrow, please feel free to get back to us and we would be happy to chat more with you

~ Shubham