Thanks John, yes using a standalone is not ideal, we used to have replication but the performance lagged behind what we needed, cutting back to the bare minimum has been the only way we’ve been able to keep up (until now) and as we are only storing data on a best effort basis we’ve been able to put up with the inconvenience. Thanks for the info on the deletes, that could well be what we see, the deletes happen at the start of the collection and the inserts at the end so I can imagine theres a lot of paging from disk happening.
Theres not really an option to reduce the data, we are at the mercy of what is being sent to us, we can’t control the json.
$unionWith sounds very interesting, I wondered about keeping a collection a day and just dropping a day at a time but I thought my queries would suffer, I’ll certainly look into this.
Looks like it’s time to shard, I’ll order more memory for the servers and resurrect my sharding experiments.