We are currently planing to do the switch from a self-hosted MongoDB V5 standalone to an Atlas cluster with the standard 3 total replica nodes.
We are hosting our SaaS, Orderlion, which basically runs several webshops and mobile app endpoints, for our F&B suppliers.
The totally standard load consists of up to 200 concurrent users, doing “typical webshop stuff” - they search for products, add products to their cart, change quantities and send orders. These are B2B users tho, so a typical order consists of anywhere from 30 to 50 positions - totally different usecase (and load) compared to B2C.
The bottleneck occurs, when our separate node app (we call it the “worker”), does some imports of basically products- and price data into the DB. It can happen that only a single supplier triggers an import of ~100 CSV files, each containing data of ~2000 products and prices.
While this import is running (per csv file we need anywhere from 5 to 20 seconds), the load on the DB is pretty significant. We run all these queries in unordered bulk operations, but still - the load on the DB becomes very high and during this time, all the “normal” queries of the (buyer) users trying to search for products and placing an order become a lot slower!
→ What can we do?
- Should I try to use a different read preference on the worker’s bulk ops?
- Should I play around with write concerns? If yes, what value should I use?
- What else could I do?
It is not necessary, that all the product- and price updates being imported are immediately visible to the (buyer) users - a short delay is no problem! I just want to keep the load on the DB as low as possible and, that is at least my idea, let the worker “work with secondaries”, while all the “normal buyer users” using our normal webapp work with the primary node.
Thanks a lot for your insights in advance!