Hi everyone!
I am currently working on a MongoDB test system on a server for a proof of concept project, to see the limits for a later telecommunication project, but I have some performance issues with the MongoDB on the server.
The specs of the server:
CPU: 2x AMD epyc 7453
RAM: 256GB
Storage: 12x 20TB ultrastar HDD, 2x 1TB NVMe SSD
OS: Debian 11
I’m working with a more or less realistic dummy data, which includes about 8.5kB of data per document (timestamp, IPv6 addresses, random 32 and 64 bit values…etc), in database sizes on the scale of 10…100GB (later scaling up to the terrabyte territory). All test ran on the same server on the localhost address.
Without sharding, replica sets, and with one mongod process, the results were something like this:
- without indexing, about 60-85.000 inserts per second, depending on how aggregated the data was (eg. inserting 100…1.000…100.000 documents at once)
- with indexing, depending on the number of indexed fields (and the aggregation), about 30-60.000 inserts per second
This seemed like a realistic range, based on articles and previous tests (also to be clear, these are the speed of the inserts themselves, no data handling is calculated in this). Though the server was not running at full power, it should be capable of higher performance overall.
At this point it seemed to be a good opportunity to test sharding on a single server - the concept was that if more mongod instances ran on the same server, the overall performace would be higher. I used the timestamp as a shard key, in ranged mode, since it’s more or less an “incemental” value and therefore doesn’t really need hashing for appropriate load balancing. And…this is where I lost track.
When I used 3 different shard servers on one SSD, the insertion speed was around 15-40.000 inserts per second. With 11 different shard servers still on one SSD, I got around 15-35.000 inserts per second, and when I switched to 11 different shard servers, each on a dedicated HDD, I got around 1000-30.000 inserts per second, which is very far behind the un-sharded test results.
The CPU was not running on full power, the full system memory is about 25 times more than the size of the database(s), and even a single HDD (or one SSD) should be able to write more data than that (I mean in data speed). Maybe I should test other shard key strategies, change the test scenario in case the storage cache is corrupting the results, but I’m not really sure about that.
Has anyone any suggestions on the topic?