Hello, also former MongoDB Employee here,
I’m working on an experiment to compare performance in an environment between MongoDB, Redis, ScyllaDB, MariaDB, and MySQL (NoSQL) for on premise and hybrid infrastructure. As well as in the cloud and the cloud services offered by each vendor.
My focus eventually will also go to the cloud such as Atlas, etc. And then compare cloud vs on premise performance, then also how the mobile devices services work such as Device Sync vs AWS App Sync etc. and then comparing GraphQL services where applicable and so on and so forth.
This is dominantly for academic research and working to be as unbiased and direct to findings as I can get.
The Problem
Using MongoDB 6.0.5, 5.0.15, and 4.4.19, for some odd reason even when no data is being stored at all, literally just running the service, MongoDB is reading and writing the following:
6.0.5 is going through 68kB a minute of Read/Write
5.0.15 is going through 74kB a minute of Read/Write
4.4.19 is going through 39kB a minute of Read/Write
The impact of this:
4.4.19 will overwrite ~20.5GB of SSD space per year per instance/service.
5.0.15 will overwrite ~38.9GB of SSD space per year per instance/service
6.0.5 will overwrite ~35.74GB of SSD space per year per instance/service
The impact to this, is that this is without even having any data, just an instance using the Community Server and having an admin account login. No sharding, no other configurations besides the following:
Storage path, system log destination, the port, ad process management etc. Everything is just default.
This isn’t seen in other vendors, is there a specific operation that causes this to occur? The significance of this is that this causes premature failure of SSDs that organically have a limited number of reads/writes. Once you start putting data on the services such as a JSON doc with typical Name, Address, Phone Number, etc. and store just the sample airBnB data, this can jump the reads/writes at rest to almost twice of what it’s doing.
Then if you start a 3 sharded cluster, and combine each shard, just at rest, it multiplies further. In my research for root cause of this issue, I found a user post from 2021 describing this same issue with a similar finding: Martin_Beran who made the same discovery apparently in Jan of 2021.
Is there any information of why it is doing this, or how to throttle these misc read/writes down to save on hardware?
Is there any performance impacts known after doing this?
Is this some kind of old bug?
I don’t see this being a problem with cloud managed services like Atlas, but for on-premise performance and for hybrid performance, this brings and MTTF metric for hardware impact that is significant in comparison to other services.
Another Large Question
What exactly is MongoDB reading and writing when it’s not storing anything? In my attempts to find whatever it’s writing or reading, I literally can’t find anything at all. This is literally just running after installation and basic config with everything but sample data loaded. Once you add sample data the rates of reads/writes at rest exponentially increase for no known reason.