Is the current 1TB data limit per cluster, per database, or per collection?
Currently, the sum of my data collections + indexes in a serverless cluster are about 150GB. However, my “dataSize” is 400GB. Does serverless charge for dataSize or actual data stored?
It the case dataSize is paid by customer, is there a way to decrease it? For our use case we don’t expect the data to go beyond 200GB. Not sure if serverless is trying to be smart on how large the cluster should be.
Hi @Ouwen_Huang and welcome in the MongoDB Community !
The 1TB limit is per cluster. See all the other limitations here:
About the costs, you can read more about them here:
And you can also check your Billing tab in Atlas to check exactly what is counted. Don’t forget that a MongoDB Atlas cluster contains your data + the oplog + the indexes + system collections. All this need some space. If you want to reduce your data size, maybe you could consider archiving some data in the Data Lake. Sometimes a wrong data model (schema) can also lead to unnecessary data sizes.
The WiredTiger compression can mitigate this a bit but it’s not magical either.
Thanks for the quick response! I am trying to debug why the storage size is so large (4x the underlying collection). Datasize was about 400GB before and after a couple small <1GB indexes were created. The oplog + system collections should be using the defaults.
I’ve also tried using the compact command on my collections via the mongosh but it doesn’t seem to have affected the size.
From what I see in this stats, you have 423 GB of uncompressed data stored in MongoDB and because of the compression of WiredTiger, it’s reduced to 104 GB. You also have 15 GB of indexes.
My bad about the rs.printReplicationInfo() command, it’s actually written in the Atlas Serverless Limitations link I shared earlier: there is no access to the collections in the local database.
Did you check your billing and how much data storage is billed for this cluster?
I talked with the Atlas & Serverless team and they explained to me a bit more how it works.
So, yes, Atlas is billing Serverless based on the uncompressed size of your BSON docs + the indexes. The idea of billing on the uncompressed data rather than on the compressed data is that the final price doesn’t depend on the performances of the compression algorithm that WiredTiger is using. So it’s always fair and wouldn’t change if we update the compression algorithm in the future.
If Atlas Serverless was billing on the compressed size, it would be x4 or x5 more expensive so it would seem that it’s less competitive and it would be less predictable as the compression can be more or less performant depending on the schema design you are using, the field types, etc. So it would be more complicated to predict your serverless costs in advance & plan ahead your spendings.
Finally, the 1TB storage limitation is based on the collection + indexes data compressed. It is expected that users would migrate to Atlas Dedicated clusters if they come close to the limit. Eventually in the future, the team wants to push this limit up or completely remove it.
@MaBeuLux88, this was very helpful. Thanks for asking on my behalf.
So far I’m quite happy with the performance of mongo serverless, our use-case is extremely bursty, so paying for compute use is very attractive. I did notice some issues when scaling from 10GB insertion to 400GB there was some downtime which we needed to code resiliency for. I’m guessing there may have been a resource allocation trigger happening behind the curtain.
I think I understand what you mean on cost: if billed on compressed size, the storage cost would just be scaled up 4x - 5x, so its the same price just different way to view it. I would actually prefer pricing on compressed disk. “Uncompressed” data pricing encourages the customer side to design around it.
It could increase the number of RPU and WPU that you consume if the compression was disabled. Not completely sure about that one.
But using compression is definitely a big performance boost.
It’s also the point of serverless: you don’t need to know how it’s managed in the background !