at my job, we’ve been evaluating the MongoDB Connector for BI for our visualization project. Unfortunately there are some problems we have noticed, especially in the interplay with Power BI and the Power Query editor. If we cannot resolve them, we will have to change our planned software stack and remove the BI Connector from the equation.
Concerns our largest collection:
~34m documents and growing
~34 GB uncompressed
~1 kB Avg. document size
2 Indexes: _id, and a “foreign key” (ObjectId linking this (n:) to document (:1) in much smaller collection)
After storing about ~20m documents, performance gets severely degraded. Most times an update of Power BI reports is possible after 1 or 2 retries. We are looking to move to higher Power BI tiers anyway, where we’d do refresh partitions to avoid re-importing the whole DB on every report refresh.
However, the really big problem is that the Power Query editor stops working. Various error messages emerge but no real indication of what is wrong. Through the MySQL endpoint I usually got an error code somewhere very close to Int32.MinValue, if that means anything to anyone.
Things we tried:
Turning off all user-adjustable background tasks in Power BI
Creating a view that reduces number of documents by date field ($lookup from superordinate entity, indexed field) >>> was even slower/less stable for some reason, “cardinality altering” in logs but no real explanation – and I do not actually alter the cardinality of the result
Making sure we set always set the “supported row reduction clause” to LIMIT on our Power BI data source
Power Query column profiling based on first 1000 rows (the default)
Power Query column profiling based on all rows (tried out of desperation)
Use the BI Connector’s MySQL endpoint instead of the MongoDB ODBC Driver
Things not yet tried:
Use a view based on a denormalized date field (just thought of it now)
Throw more CPU/RAM at the problem. Current host is admittedly underspecced but one would hope that 1000 rows for column profiling can be fetched by any machine
So, we currently persist an older subset of the data from xyz_collection to xyz_collection_archive and then delete them from the source collection. Hardly ideal, as we were hoping to visualize without a time limit.
Hello @Fabian_Schneider My name is Alexi Antonino and I am the Product Manger for the BI Connector and Atlas SQL. First up, I am sorry the BI Connector with Power BI is not giving you the results necessary.
My team is currently working on making Atlas SQL generally available (expected June of 2023). Atlas SQL will eventually replace the Atlas BI Connector. Atlas SQL is built upon a SQL-92 compatible dialog (not MySQL) and uses our Atlas Data Federation for a query engine allowing us to limit namespaces. Also, we are running an Private Preview program now for an Atlas SQL Power BI Connector - it is a custom connector build specifically for MongoDB. This first version of the connector does not offer support for Direct Query (which I think you might benefit from based on your data volumes). We plan to release a version with DQ later this summer though. The version today, while it only supports the import mode (which can limit data based on memory), might still be ok if we are able to limit your namespaces or data using Data Federation and views. If your MongoDB instance is in Atlas - let’s connect right away and I can show you this new connector and share with you the roadmap. I can even show you this if you are using an On-Premise version of MongoDB because while it is very future looking, we still plan to offer an on-prem SQL interface as well (but we are in the early phases of scoping this out).
If you are interesting in seeing a demo of Atlas SQL +Power BI, or just hearing/seeing the roadmap for the MongoDB SQL Interface, please email me and we can schedule some time.
Also, if the BI Connector is not going to work for you I can provide you with a list of some 3rd party connectors that might suit your needs (MongoDB+Power BI with DQ support). This might be a path to consider if the Atlas SQL connectors and their timeline don’t align with yours.
Dealing with performance issues when working with large datasets can be a real pain. From what you’ve described, it seems like you’ve tried a lot of things already, but still no luck. Have you considered looking into some power bi courses that might help you optimize your workflow and get the most out of your data? As for your specific question about image format and social media, it depends on the platform you’re using. For example, Instagram tends to prefer square images with a resolution of 1080x1080 pixels. Facebook, on the other hand, allows for a wider variety of image sizes and resolutions. In general, it’s a good idea to aim for high-quality images that are optimized for the specific platform you’re using.
Hi!
Just found this topic through Google and I’m experience the same problems with the performance.
Since you said that June 2023 will be probably the first release date I’m wondering if this is still the case and when we can expect the first version?