Strange behavior of wildcard search on big collection

Roland_I · May 7, 2025, 11:35pm

Hey, I have a slightly big collection (1M+ documents, each document less than 1kb), M10 cluster, and I run wildcard queries with filters.

The problem is that wildcard queries are super slow (>= 1 seconds) despite the fact that filtering (without wildcard filter) only returns 10 documents. Consider two examples:

Query with only filtering (returns only 9 (!) items)

[
  {
    $search: {
      index: "default",
      compound: {
        filter: [
          {
            equals: {
              path: "configId",
              value: ObjectId(
                "64198f01b06df0538d205dcb"
              ),
            },
          },
          {
            equals: {
              path: "languagePair",
              value: "en|fr",
            }
          }
        ],
      },
    },
  },
]

This query runs really fast (< 0.01s) and returns only 9 documents.

Then, I add the wildcard filter to the query above:

[
  {
    $search: {
      index: "default",
      compound: {
        filter: [
          {
            equals: {
              path: "configId",
              value: ObjectId(
                "64198f01b06df0538d205dcb"
              ),
            },
          },
          {
            equals: {
              path: "languagePair",
              value: "en|fr",
            },
          },
          {
            compound: {
              should: [
                {
                  wildcard: {
                    query: "*text*",
                    allowAnalyzedField: true,
                    path: {
                      value: "target",
                      multi: "keywordAnalyzer",
                    },
                  },
                },
                {
                  wildcard: {
                    query: "*text*",
                    allowAnalyzedField: true,
                    path: {
                      value: "source",
                      multi: "keywordAnalyzer",
                    },
                  },
                },
              ],
              minimumShouldMatch: 1,
            },
          },
        ],
      },
    },
  },
]

This query runs for 1.2 seconds. And returns only 1 document (out of 9).

This is very strange behavior, isn’t it?

Here’s my atlas index:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "_id": {
        "type": "objectId"
      },
      "configId": {
        "type": "objectId"
      },
      "languagePair": [
        {
          "type": "stringFacet"
        },
        {
          "type": "token"
        },
        {
          "analyzer": "keywordLowerer",
          "searchAnalyzer": "keywordLowerer",
          "type": "string"
        }
      ],
      "source": [
        {
          "analyzer": "standardLowerer",
          "multi": {
            "keywordAnalyzer": {
              "analyzer": "keywordLowerer",
              "searchAnalyzer": "keywordLowerer",
              "store": false,
              "type": "string"
            }
          },
          "searchAnalyzer": "standardLowerer",
          "store": false,
          "type": "string"
        }
      ],
      "target": [
        {
          "analyzer": "standardLowerer",
          "multi": {
            "keywordAnalyzer": {
              "analyzer": "keywordLowerer",
              "searchAnalyzer": "keywordLowerer",
              "store": false,
              "type": "string"
            }
          },
          "searchAnalyzer": "standardLowerer",
          "store": false,
          "type": "string"
        }
      ]
    }
  },
  "analyzers": [
    {
      "charFilters": [
        {
          "ignoredTags": [],
          "type": "htmlStrip"
        },
        {
          "mappings": {
            "\"": " ",
            "'": " ",
            "`": " ",
            "‘": " ",
            "’": " ",
            "“": " ",
            "”": " ",
            "„": " "
          },
          "type": "mapping"
        }
      ],
      "name": "standardLowerer",
      "tokenFilters": [
        {
          "type": "lowercase"
        }
      ],
      "tokenizer": {
        "type": "standard"
      }
    },
    {
      "charFilters": [
        {
          "ignoredTags": [],
          "type": "htmlStrip"
        },
        {
          "mappings": {
            "\"": " ",
            "'": " ",
            "`": " ",
            "‘": " ",
            "’": " ",
            "“": " ",
            "”": " ",
            "„": " "
          },
          "type": "mapping"
        }
      ],
      "name": "keywordLowerer",
      "tokenFilters": [
        {
          "type": "lowercase"
        }
      ],
      "tokenizer": {
        "type": "keyword"
      }
    }
  ]
}

amyjian · May 8, 2025, 4:29pm

Hi @Roland_I , I recommend taking a look at the explain plan of the query. This can help you understand where the query engine is spending the most time and point to possible optimizations. Feel free to share back the output here.