Inefficient/unpredictable storage for timeseries collection?

Aldo · February 4, 2023, 5:23pm

Hello, I’m experimenting with MongoDB 6.0 on Atlas, and more specifically with the newest timeseries special collection.
I started with an M10 cluster tier and, using a custom-made Java application, I filled a collection named insight with 45M (millions) of sample documents. A typical document is the following:

{
    "timestamp" : ISODate("2018-02-01T00:00:00.000+0000"),
    "meta" : {
        "customerId" : ObjectId("437573746f6d657232343639"),
        "insightId" : ObjectId("636f6e73756d61626c650000"),
        "itemDefinitionId" : ObjectId("4974656d4465663200000000"),
        "itemId" : ObjectId("4974656d3100000000000000"),
        "locationId" : ObjectId("4c6f636174696f6e37333200"),
        "month" : NumberInt(2),
        "tenantId" : ObjectId("64656d6f3100000000000000"),
        "year" : NumberInt(2018)
    },
    "_id" : ObjectId("63da7cd53e3dae5a51c21b4c"),
    "data" : {
        "value" : {
            "quantity" : 100.83965259603328,
            "cost" : 116.6917902887385
        }
    }
}

The meta fields are not completely random: an aggregation count like the following returns 1.5M

db.insight.aggregate([
    {
        $group: {
            _id: {
                itemId: "$meta.itemId",
                itemDefinitionId: "$meta.itemDefinitionId",
                customerId: "$meta.customerId",
                locationId: "$meta.locationId",
                tenantId: "$meta.tenantId",
                insightId:"$meta.insightId",
                year:"$meta.year",
                month:"$meta.month"
            }
        }
    }]).toArray().length

My bulk load procedure on M10 required 10+ hours, I had to restart it twice (each time skipping times already filled) due to my PC suspension. Moreover it also went through two automatic storage resize of Atlas (which worked perfectly). At the end I created a secondary index, with this definition:

{
    "meta.tenantId" : 1,
    "meta.insightId" : 1,
    "meta.customerId" : 1,
    "timestamp" : -1
}

The secondary index is crucial for my experiment and I was really surprised to discover its size: 482.7 MB.
I then checked the dbstats of the collection and got the following:

{
    "ns" : "test.insight",
    "size" : NumberLong(21479128473),
    "timeseries" : {
        "bucketsNs" : "test.system.buckets.insight",
        "bucketCount" : NumberInt(35610000),
        "avgBucketSize" : NumberInt(603),
		...
    },
    "storageSize" : NumberLong(3451998208),
	...
	
    "indexSizes" : {
        "meta.tenantId_1_meta.insightId_1_meta.customerId_1_timestamp_-1" : NumberInt(506097664)
    },
	...
}

In summary the storage size is 3.2 GB, whereas the size is 20 GB. And again, the index is 482.7 MB. Plus, the bucket count is 35.6 millions.

I repeated my procedure, this time on M40 with no storage resize required. After 3 hours, without interruption, the same data appeared on the collection, but with completely different (and better) internals.
Not sure about the repeatability of my load procedure I tried a very different approach: using 3T Studio I selected Duplicate Collection and got a copy of the very first collection (the one with storage size 3.2 GB, size 20 GB, and index 482.7 MB). The dbstats confirmed the info I got from my 2nd attempt:

{
    "ns" : "test.copy_of_insight",
    "size" : NumberLong(2691175332),
    "timeseries" : {
        "bucketsNs" : "test.system.buckets.copy_of_insight",
        "bucketCount" : NumberInt(2376681),
        "avgBucketSize" : NumberInt(1132),
		...
    },
    "storageSize" : NumberInt(1077383168),
	...
    "indexSizes" : {
        "meta.tenantId_1_meta.insightId_1_meta.customerId_1_timestamp_-1" : NumberInt(34258944)
    },
	...
}

The rewritten collection has size 2.5 GB, storage size 1 GB and index size 32.7 MB. And the bucket count is 2.3 millions.
I can provide a snapshot of the database, with both the collections (they are in my Atlas account) for further examination, there is no sensitive data in it.

I wonder if I hit, during my first load, some kind of non-optimized (or bugged) code related to the bucket creation. I suspect that 35 millions of buckets lead to a 400+MB of index size and then the 3T Studio Duplicate Collection fully rewriting the data optimized the bucket distribution, lowering them to 2.3 and thus to a 32MB of index size.
If you are aware that such situation of “unoptimized buckets” could happen in certain situations (such as storage resize in Atlas) is it possibile to see in a future version of MongoDB an admin command to rebalance the buckets ?
Please take into account that a 400+MB of index size (vs an optimal 32MB) has serious impact on the cluster tier on Atlas, if you plan to have the index in memory. Not mentioning how differently the 2 collections performed on the same aggregation (using the secondary index) due to how the data are differently distributed between 35 millions of buckets vs 2.3 millions of buckets.

Sumanta_Mukhopadhyay · February 5, 2023, 11:38am

The main difference between the two collections is the way they were created. The first collection was filled with a custom-made Java application, which took 10+ hours and went through storage resizes. The secondary index size was 482.7 MB. The second collection was created by duplicating the first collection with 3T Studio, which resulted in a smaller size, storage size, and index size (2.5 GB, 1 GB, and 32.7 MB respectively). The bucket count also decreased to 2.3 million. The difference in size and structure is likely due to the differences in the data loading process and the underlying storage mechanisms used in each case.

Aldo · February 7, 2023, 8:50am

Hello Sumanta,
that’s exactly what I supposed.
The real questions are:

how likely the worst case scenario will happen in a “normal” running of a timeseries collection, i.e. not in a bulk load like the one I did for testing, but in a “natural” data ingestion prolonged for months or years ?
and how MongoDB helps in amending the worst case scenario, i.e. in rebalancing buckets ?

For the question 1 I really have no idea.
For the question 2 (identical data but very different underlying storage distribution, with very different costs and performances, such as 10x) the first thing that come to my mind is named “vacuum” on PostgreSQL. And the second thing that come to my mind is named “compaction” on Apache Cassandra. I would like to hear something about it from the MongoDB team…

Thanks

cagatay_erem · February 11, 2024, 9:28pm

I have also faced similar issue. I have exactly 2 independent environment with metadata with 7.0.2 community version, One of it is way much better at bucketing and the other is around 5x worse than the other. What’s annoying is better bucketing environment has only have 16 GB ram and 8 CPU and the worse environment has 128 GB ram and 16 CPU but like I mentioned, its still worse than the other. I’m not sure what’s causing inefficiency during load. ITs like some feature needed for re-bucketing the current data’s rather than on the fly.