How to best analyse my data?

Hello,

I’ve got around 30 collections.
All of them have the same data structure.
Inside the documents, I’ve got standard fields like name, created_at etc.
Additionally I’ve got an object in a field with a big structure (several hundred fields with sub-object, sub-arrays and objects inside arrays etc.)
My strong suite is Python and I understand a bit about MongoDB - pipelines, aggregation etc.

Now I’ve done some research and would like to get your advise:

  1. It seems as if I can’t create views with multiple collections easily?
  2. I’d like to do very flexible analytics on my data and ideally with very little code - especially in this deep structured field.

I had several ideas:

  1. Using Metabase which is ok until there are arrays. (I need to $unwind somewhere else first).
    It seems it would be best to create views with what I need in Compass and then use those views in Metabase. But how to get the results from all collections?

  2. Use Python to iterate over all collections, add the required data into a Pandas and take it from there.

  3. Using the MongoDB Connector for BI and (probably) building sth like views again that appear like MySQL tables and then use Metabase or cube to retrieve the data.

I’d need the data first as a standard CSV report for checking.
And afterwards as charts - but they could be done in LibreOffice Calc or any other charting software.

Any ideas on how to solve the issue with multiple collections?
And what LowCode/NoCode BI/Analytics can I use?
Is there a LowCode/NoCode way to click together my views instead of writing the pipelines myself?

Thanks,
Chris

Hi Chris,

Is this deployment on Mongodb Atlas or somewhere else?

I am asking as there is a data lake connector to Atlas clusters that allow put many collections under one virtual collection defined in the data lake…

Additionally are you aware of PyMongoArrow driver?

This should ease your life when reading mongo data.

Otherwise you will have to use views or $merge to create a virtual new collection of all data in the on prem …

Thanks
Pavel

Thanks

Hello Pavel,

I’ve got this MongoDB installed on my own VPS (so not atlas / cluster data lake available).

Since yesterday, I’ve been playing around with Apache Spark / pySpark - but it seems rather complicated to get it to run in the first place.

I’ll give PyMongoArrow a try - maybe that’s easier than Spark. Thanks for the hint.

1 Like