Performance issue loading data into MongoDB

I think you are right. I’m going to let the load run for a bit longer, then I’ll probably tear it all down and set up a single r5.4xl mongod server and see how it’s performance behaves.

That’s a bumer about the way replica sets work. I’d hoped that a two-node cluster would simple become a one-node cluster sing one node is automatically a majority. After reading up on it, it looks like I need an arbiter to make it work right. Which begs the question: can I set up a single machine that can be an arbiter for all six 2-node replica sets?

And my shard key is a hash. Technically it is a hash of a hash: The unique record for this collection is [(userID),(teamnumber)] so in preprocessing I create an additional column called “hash” which is the hash of the combination of userID and teamNumber. I ran the command

sh.shardCollection("database.collection",{hash: "hashed"})

to create the hash key for the shard collection.

It seems to be working correctly, as network traffic, storage used and memory used is similar across all six primary nodes.