Help me understand mongo's index caching

Zack_Newsham · October 24, 2024, 1:16am

I’m trying to debug some general performance issues - and in looking into these I found I didn’t really understand how mongo caches indexes - and particularly how/why it would evict them.

In particular, why would an index used only have around 0.5% of it’s data in the cache? It’s used~400/minute - with broad and unpredictable access (often single index key lookup, but sometimes a somewhat broad scan of 50k+ keys)

From what I can tell from the stats (bytes read into cache), we read the entire index into twice an hour - which I guess isn’t bad - but does seem like we probably need to go to disk every time.

The reason it’s concerning is we found we were hitting our disk IOPS limit - queries against this collection (using the index) that should have been fast sometimes took upwards of a minute to run - we temporarily mitigated this by upgrading the cluster, but we’re not really happy with that solution, given that it seems like it shouldn’t have been necessary.

Some specific questions:

For an index that’s hit so frequently and without any access pattern - why would such a tiny portion of the index be stored in memory?
Am I reading the metrics correctly? I’m aware I’m only looking at data in the roughly 50% of wired tiger cache
Is this behaviour tunable?

A bit of background:

It’s running on Mongo 6 on an M60 Atlas cluster.

The entire DB is about 340GB - most of the large data is extremely infrequently accessed and has excellent temporal locality.

This one collection in question has a `size`` of ~35GB (6GB compressed, 9GB with indexes). In the cache right now - there’s 11GB of data, the index has only 1MB (out of ~250MB).

Jennysson_Junior · October 24, 2024, 12:51pm

MongoDB uses the WiredTiger storage engine, which maintains two main components in memory:

Data cache: Stores recently accessed documents and index entries.
Index cache: Stores recently used portions of indexes.

The amount of memory available for caching is limited by the WiredTiger cache size, which, in your case, is likely around 50% of the cluster’s RAM (default behavior for Atlas clusters).

Why would such a tiny portion of the index be stored in memory, despite its frequent usage?

I suggest you to investigate a possible cache pressure. Where your data and indexes are competing for this space in memory, maybe there is a pressure to have more free space on memory pressing the indexes.

Am I reading the metrics correctly?

Yes, based on your description, it sounds like you’re correctly interpreting the metrics. WiredTiger evicts items from cache when it’s under pressure.

Is this behaviour tunable?

I Believe so.
First of all, ensure that your indexes are properly optimized for your queries, especially those involving broad scans. If your queries require large range scans or if you’re missing a compound index, it could lead to MongoDB reading more data into memory than necessary where it’s making pressure.

You can review on Atlas the Performance Advisor tab, this can help you suggesting perfomance otimization and you can see the Query Profiler in Query Insights Tab to see the queries that are having degradeted performance.

And personally advice, use hint as possible to define on your code what index you can use, I has a problem with query planner chosing a bad way to solve a query.

I hope that it helps you on your investigation!

Best Regards,

Zack_Newsham · October 25, 2024, 4:31am

Thanks for the reply - but the index is being used as expected, called 400 times per minute - I can’t fathom that it’s more efficient to load that index into the cache 400/min than it is to evict something else (e.g., the 30% of the entire collection currently occupying cache).

In regards to tunable behaviour - is there a way to hint to mongo to prefer to retain indexes in the cache?

Jennysson_Junior · October 25, 2024, 1:06pm

An index that is frequently accessed isn’t always the right index for a given query. For example:

Range queries, compound indexes with improperly ordered fields, or large result sets can cause an index to perform poorly, leading to inefficiencies even if it’s frequently accessed.
If the index doesn’t provide high selectivity (i.e., it’s not filtering out a significant amount of data), it could be adding overhead rather than speeding up the query. This could lead to MongoDB frequently evicting it in favor of data that’s more useful to the overall workload.

This can explain why MongoDB might prioritize keeping data (documents) in cache over retaining that index. How you build indexes can dramatically affect their efficiency, at this point I believe that one of the most important topic on mongoDB is the ESR Rule.

Zack_Newsham · October 25, 2024, 1:34pm

I appreciate the answer - and I recognise that your priority is probably to push engineers to implement efficient indexes - which is great, but my goal (right now) is to understand why mongo behaves this way.

Can you point me at any documentation around how Mongo, or WiredTiger - which I’m quite familiar with - chooses what to evict from the cache? Is it a pure LRU, or is there some other heuristic?