Error querying Atlas index

Wayne_Smallman · June 15, 2022, 10:47am

Hi, I’m in the process of migrating the search in an application from Elastic to MongoDB Atlas, and the results have been good.

I’m now attempting to replicate some of the features I’ve been using in Elastic, such as the analyzers:

{
  "analyzer": "lucene.standard",
  "searchAnalyzer": "lucene.standard",
  "mappings": {
    "dynamic": false,
    "fields": {
      "attributes": {
        "dynamic": true,
        "type": "document"
      },
      "createdAt": {
        "type": "date"
      },
      "isActive": {
        "type": "boolean"
      },
      "isFavourite": {
        "type": "boolean"
      },
      "note": {
        "analyzer": "htmlStrippingAnalyzer",
        "type": "string"
      },
      "title": {
        "multi": {
          "keywordAnalyzer": {
            "analyzer": "ngramShingler",
            "type": "string"
          }
        },
        "type": "string"
      },
      "typeOfAsset": {
        "type": "string"
      },
      "updatedAt": {
        "type": "date"
      }
    }
  },
  "analyzers": [
    {
      "charFilters": [],
      "name": "ngramShingler",
      "tokenFilters": [
        {
          "maxShingleSize": 3,
          "minShingleSize": 2,
          "type": "shingle"
        }
      ],
      "tokenizer": {
        "maxGram": 5,
        "minGram": 2,
        "type": "nGram"
      }
    },
    {
      "charFilters": [
        {
          "ignoredTags": [
            "a",
            "div",
            "p",
            "strong",
            "em",
            "img",
            "figure",
            "figcaption",
            "ol",
            "ul",
            "li",
            "span"
          ],
          "type": "htmlStrip"
        }
      ],
      "name": "htmlStrippingAnalyzer",
      "tokenFilters": [],
      "tokenizer": {
        "type": "standard"
      }
    }
  ]
}

… and the code in Node is:

r

return new Promise( async (resolve, reject) => {
  try {

    const search = {
      $search: {
        index: 'assets',
        compound: { 
          should: [{
            text: {
              query: args.phraseToSearch,
              path: [{ value: 'title', multi: 'keywordAnalyzer' }],
              score: { boost: { value: 3 } }
            }
          }, {
            text: {
              query: args.phraseToSearch,
              path: 'note'
            }
          }]
        }
      }
    }

    const project = {
      $project: {
        _id: 0,
        id: '$_id',
        userId: 1,
        folderId: 1,
        title: 1,
        note: 1,
        typeOfAsset: 1,
        isFavourite: 1,
        createdAt: 1,
        updatedAt: 1,
        isActive: 1,
        attributes: 1,
        preferences: 1,
        score: {
          $meta: 'searchScore'
        }
      }
    }

    const match = {
      $match: {
        userId: args.userId
      }
    }

    const skip = {
      $skip: args.skip
    }

    const limit = {
      $limit: args.first
    }

    const group = {
      $group: {
        _id: null,
        count: { $sum: 1 }
      }
    }

    const sort = {
      $sort: {
        [args.orderBy]: args.orderDirection === 'asc' ? 1 : -1
      }
    }

    const searchAllAssets = await Models.Assets.schema.aggregate([
      search, project, match, sort, skip, limit
    ])

    const [ totalNumberOfAssets ] = await Models.Assets.schema.aggregate([
      search, project, sort, match, group
    ])

    return await resolve({
      searchAllAssets: searchAllAssets,
      totalNumberOfAssets: totalNumberOfAssets.count
    })

  } catch (exception) {
    return reject(new Error(exception))
  }
})

When I use the search I get the following error:

[GraphQL error]: Message: MongoServerError: PlanExecutor error during aggregation :: caused by :: Remote error from mongot :: caused by :: query has expanded into too many sub-queries internally: maxClauseCount is set to 1024

I’ve Googled maxClauseCount but found nothing useful.

I don’t have a lot of experience debugging queries (I’m using the Compass client, and make occasional use of MONGOSH), and it’s possible I’ve got something wrong with the index (I’ve copied and pasted the two analyzers from the documentation and then made a few tweaks).

Any advice would be much appreciated.

Wayne_Smallman · June 15, 2022, 12:45pm

I ran the following statement in MONGOSH in Compass:

db.assets.aggregate([
  {
    $search: {
      index: 'assets',
      compound: { 
        should: [{
          text: {
            query: 'machine learning',
            path: [{ value: 'title', multi: 'keywordAnalyzer' }],
            score: { boost: { value: 3 } }
          }
        }, {
          text: {
            query: 'machine learning',
            path: 'note'
          }
        }]
      }
    }
  }
])

So that at least eliminates the remaining parts of the aggregate method.

Wayne_Smallman · April 30, 2023, 9:17am

I’m still struggling with this problem, and it seems to be the index, which I’ve since tweaked:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "createdAt": {
        "type": "date"
      },
      "isActive": {
        "type": "number"
      },
      "isFavourite": {
        "type": "boolean"
      },
      "note": {
        "analyzer": "htmlStrippingAnalyzer",
        "searchAnalyzer": "htmlStrippingAnalyzer",
        "type": "string"
      },
      "title": {
        "analyzer": "lucene.keyword",
        "multi": {
          "keywordAnalyzer": {
            "analyzer": "ngramShingler",
            "searchAnalyzer": "ngramShingler",
            "type": "string"
          }
        },
        "searchAnalyzer": "lucene.keyword",
        "type": "string"
      },
      "typeOfAsset": {
        "type": "string"
      },
      "updatedAt": {
        "type": "date"
      }
    }
  },
  "analyzers": [
    {
      "charFilters": [],
      "name": "ngramShingler",
      "tokenizer": {
        "type": "standard"
      },
      "tokenFilters": [
        {
          "type": "englishPossessive"
        },
        {
          "type": "nGram",
          "minGram": 4,
          "maxGram": 7
        }
      ]
    },
    {
      "charFilters": [
        {
          "ignoredTags": [
            "a",
            "div",
            "p",
            "strong",
            "em",
            "img",
            "figure",
            "figcaption",
            "ol",
            "ul",
            "li",
            "span"
          ],
          "type": "htmlStrip"
        }
      ],
      "name": "htmlStrippingAnalyzer"
    }
  ]
}

The problem is, this is also failing in the Search Tester, in spite of having copied and pasted from the official documentation to create the analyzer in this index.

Jason_Tran · May 1, 2023, 2:14am

Hi @Wayne_Smallman,

Thanks for providing the search details and the error message. I assume you’re getting this error against an M0, M2 or M5 tier cluster - please correct me if I am wrong here. This assumption is based off the Atlas Search M0 (Free Cluster), M2, and M5 Limitations documentation which states:

Lucene’s default clause limit of 1024 applies to any BooleanQuery created for searches.

More details on the clause error and its possible cause(s) can be found in the Lucene documentation here.

There a few options you may wish to consider or test out:

Tweak the Atlas Search query to:
- Reduce the number of search terms (e.g. From “Generic Error” to “Generic”)
- Altering the minGram value from 4 to 5 (or higher, as an example)
- Changing from nGram to edgeGram
- Combination / mix-n-match of all of the above points
Creating a new M10 tier or higher cluster and importing the same data as your current environment where the error is being generated + re-creating the same atlas search index.
- Note: I write creating a new cluster as you cannot downgrade an M10+ tier cluster back down to M0, M2 or M5. If your testing verifies that the M10+ tier resolves the error messages then perhaps you can then consider upgrading the original cluster.

Just to also dive a bit deeper into the issue, could you also advise what search terms you’re using that are generating this error?

Regards,
Jason

Wayne_Smallman · May 8, 2023, 8:22am

Hi @Jason_Tran, much appreciated!

system · May 17, 2023, 3:42am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.