we store not too small json documents in mongodb. We will add 10s of millions of documents every month.
We found that storing 30 mio documents eats up 2 TB of data.
As the format is json, there is a lot of repetition in each document (every key repeats in every document). So we started to shorten the keys and could reduce the size to 300 GB.
That’s good but I feel its still way too much. If we wouldn’t have the json boilerplate we still could reduce it by a factor of 10 easily again.
On the other hand I feel that messing around with json keys and transforming them from something human understandable like ‘baseCurrencyAmount’:‘EUR’ into something barely readable like ‘baCA’:‘EUR’ is in general the wrong direction. But you also see that the actual value is still smaller than the key.
Is there any hint you can give? Is this where a document-database is not the right thing and we need to go to good old SQL?
And side question: it seems that mongo compression compresses each document separately. As if would compress it somehow jointly, my guess is that the key lenghths would not play a role any more.