Larger indexes in MongoDB 5 compared to MongoDB 4

Hi all,

I updated a system from MongoDB 4 to MongoDB 5 and observe much larger index sizes in MongoDB 5 than in MongoDB 4. I now run two systems in parallel (one with MongoDB 4 and one with MongoDB 5) and these are my observations for a collection with lots of data:

Mongo 4 Mongo 5
count 1.638.164.323 1.096.569.012
avgObjSize 495 495
indexSizes._id_ 18.808.434.688 39.714.480.128
avg indexSizes._id_ 11 36

Other indexes also increase, maybe because of the increased space for the _id_ field. What can be the cause of the index growth? The only change in configuration I did was to seperate the database to an own directry (directoryperdb=true).

Records are inserted to the collection with a rate of about 1000 docs/second and get removed by a TTL-index.

Thanks for any hint
Franz

PS: The update was not a real update. I could just install the new system with MongoDB 5 and start with a fresh database.

2 Likes

Hi @Franz_van_Betteraey,

That’s weird and most probably unexpected.
Are the documents exactly the same? (i.e content of the _id identical?)
Does the _id contain the default ObjectId or something else? If it’s something else, does its size varies?
Which versions of MongoDB are you using exactly? This could help to track a ticket eventually.

Cheers,
Maxime.

Hi @MaBeuLux88,

the versions are 4.2.19 and 5.0.8. The document content is generated (test data), thus not exactly the same but comparable. My client is a SpringBoot Application using the Spring Data MongoDB framework. The id is of the org.bson.types.ObjectId type. I do not set the id myself. This is done by MongoDB (or the Spring Framework). The ids look ‘normal’ like this (in both server versions):

        "_id" : ObjectId("623f2e0f200e061cb71ca9ae")

With the server update I also updated the client to use the java driver version 4.6.0 instead of 3.11.2. I have also observed a drop in performance here (in my use case). I cannot say whether this is connected to the larger index. It could also be due to connection pooling or something else. When I use the old driver version (also with the new MongoDB 5 version), I do not observe any performance loss. Therefore I think the problem is more on the client side.

Thank you for your efforts
Franz

The first thing that comes to mind when I see these indexes is to understand how the index was built and how the collection lived so far.

When an index is freshly built (as it’s _id in this case, when the collection was created) it’s very compact and optimized. But as docs are added, removed, added, removed and updated, the entries start to spread and keep space in between.

If you rebuild an index it will be very compact but also have no space in it, if you then add things to it that are spread through the index it can rapidly grow as it has to split every block to make room for new entries.

So depending if the collection are freshly loaded or used for years, this can make a huge difference. It doesn’t mean that this makes the index less efficient though. Performance issue could be related to a bunch or other reasons.

The observations were made on a fresh collection. But it is good to know that there is no fundamental changes expected here.
I will try to test this again in isolation, so that it can be reproduced in case of doubt. I still need time for that though. Thanks for now!

1 Like