Text search uses too much RAM

Hello,

I use text search in the aggregation pipeline, and I face a problem. It uses too much RAM (8GB of RAM is not enough for a single aggregation call). Can I make any optimizations to reduce the memory usage?

Collection metrics:

  • Number of documents: 380 000
  • Size of the collection: ~8GB
  • Size of the text index: 340MB

Database: MongoDB 4.4

Hi @Roman_Right,

Probably. But I don’t see how we could potentially help you without some sample documents, index definitions and the pipeline you mentioned.

Also, you are using Atlas Search here, correct?

Cheers,
Maxime.

Hi @MaBeuLux88 ,

Thank you for your reply.

No, I use stand-alone MongoDB 4.4.

The aggregation query is next:

[
    {
        "$match": {
            "$expr": {
                "$eq": ["$tag", "8"]
            },
            "$text": {"$search": "good"}
        }
    },
    {
        "$limit": 10
    }
]

The document schema is {“text”: string, “tag”: string}.

The “text” field is ~ 20000 symbols in length. It can be any text, I think. For the synthetic tests, I used parts of a book “20000 leagues under the sea” and it had the same results.

The tag field is small (<10 symbols).

The text index is set the next way: db[“my-collection”].createIndex({“text”: “text”});

I created a repo with scripts, that can reproduce my problem: GitHub - roman-right/text_index_memory_usage

I face this problem only in MongoDB 4.4.

MongoDB 5.0 works well, Atlas (with 5.0 on board) works well too.

Mb there are specific tweaks for 4.4, that I should use?

Hi @Roman_Right,

Sorry for the break, I had a baby :slight_smile: !

Are you familiar with the allowDiskUse option?

If your cluster has 8 GB or RAM, most of it is already use by the OS, the working set, the indexes and the other queries. Your aggregation can only use whatever RAM is left. Is your cluster already maxed at 8GB of RAM constantly or there is some room left for queries and for your cluster to be healthy?

I’m not sure why there is a difference between 4.4 and 5.0. It could be that your 5.0 isn’t as loaded as the 4.4 one which is in prod I’m guessing and therefore has more RAM & ressources available.

Also 5.0 is, of course, an improved version since 4.4 so maybe some features are improving the performances. Maybe it’s time to plan an upgrade and say goodbye to 2020.

Cheers,
Maxime.