Autocomplete with unique results - multi tenant

I use autocomplete index along with filter inside compound so I get results belong to a specific customer. But the result set includes duplicate items.

To get unique values I have seen $searchMeta has been suggested. But in $searchMeta I can’t provide both an operator like equals and autocomplete where I can filter customer inside equals as in the $search. Assuming customers may have 1-2 million of documents inside collection what could be the most efficient way to get unique values for autocomplete of course along with skip and limit for pagination.

Note: Filtering the customer in the beginning is important.

Atlas atlas-cihc7e-shard-0 [primary] test> db.products.find()
[
  {
    _id: ObjectId("6526396b928f922719d4fa65"),
    vendor: 'xyz',
    customer: ObjectId("6526396b928f922719d4fa69")
  },
  {
    _id: ObjectId("6526396b928f922719d4fa66"),
    vendor: 'abc',
    customer: ObjectId("6526396b928f922719d4fa79")
  },
  {
    _id: ObjectId("6526396b928f922719d4fa67"),
    vendor: 'abc',
    customer: ObjectId("6526396b928f922719d4fa79")
  },
  {
    _id: ObjectId("6526396b928f922719d4fa68"),
    vendor: 'xyz',
    customer: ObjectId("6526396b928f922719d4fa69")
  }
]

The query below returns duplicate results. So users will choose from the vendor filter on the frontend so I need to show unique values here.

db.getCollection("s_products").aggregate([
  {
    "$search": {
      "index": "autocomplete",
      "highlight": { "path": "vendor" },
      "compound": {
        "must": {
          "autocomplete": {
            "path": "vendor",
            "query": "mer",
            "tokenOrder": "sequential"
          }
        },
        "filter": {
          "equals": {
            "path": "customer",
            "value": ObjectId("6526396b928f922719d4fa69")
          }
        }
      }
    }
  },
  {
    "$project": {
      "score": { "$meta": "searchScore" },
      "_id": 0,
      "vendor": 1,
      "customer": 1,
      "highlights": { "$meta": "searchHighlights" }
    }
  },
  { "$skip": 0 },
  { "$limit": 10 }
])

Hi there, can you share a copy of your search index definition and the specific duplicates that you are seeing in the query results? This will help understand us understand what is going on.

Additionally, it would also be helpful to know what type of end user experience you are trying to achieve by using autocomplete. Autocomplete is helpful for returning results which partially match a query, e.g. searching for “xy” will return results that contain “xyz”. However, if you are looking to exactly match what’s in the document to the query, then there are alternate solutions that could be better suited for your use case. You can read more in this blog. Let me know if this helps!

Hey @amyjian ,

Users create alerts on our system after filtering their product set. We have fields such as tags (string array), vendor, title on the product collection. For the title we don’t need any uniqueness. But same vendor could be in many products. For tags I have a very similar issue like this one $searchMeta Facets - Return only filtered elements

If the distinct item count is greater than 5000 for vendor and tag, we turn the select element to server side mode where users search with the aid of autocomplete. If it’s around a million we let them write as free-text and check if the written input is valid.

Index definition - Collection has 16 millions documents.

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "customer": {
        "type": "objectId"
      },
      "tag_list": [
        {
          "minGrams": 3,
          "type": "autocomplete"
        },
        {
          "type": "stringFacet"
        }
      ],
      "title": {
        "minGrams": 3,
        "type": "autocomplete"
      },
      "vendor": [
        {
          "minGrams": 3,
          "type": "autocomplete"
        },
        {
          "type": "stringFacet"
        }
      ]
    }
  }
}
db.getCollection("s_products").aggregate([
  {
    "$search": {
      "index": "autocomplete",
      "highlight": { "path": "title" },
      "compound": {
        "must": {
          "autocomplete": {
            "path": "title",
            "query": "59",
            "tokenOrder": "sequential"
          }
        },
        "filter": {
          "equals": {
            "path": "customer",
            "value": customer_id // Replace with actual customer_id
          }
        }
      }
    }
  },
  {
    "$project": {
      "score": { "$meta": "searchScore" },
      "_id": 1,
      "title": 1,
      "customer": 1,
      "highlights": { "$meta": "searchHighlights" }
    }
  },
  { "$skip": 0 },
  { "$limit": 10 }
])

This sample query produces two different documents with same title which is fine because we need to know which one has been selected and their _ids. But for the vendor and tag filters I need to show the unique values because there are so many duplicates and it will be noisy to show them all. Also we just need to get selected string values at the end. So if user searches for “xy”, we should show only one “xyz”.

Hi @Mete , did you try a $searchMeta query to get the unique values for vendor and tags? You can nest the compound operator within the operator clause within facet like so:

{
    "$searchMeta": {
      "index": "autocomplete",
      "facet": {
        "operator": {
          "compound": {
            "must": [
              {
                "autocomplete": {
                  "path": "vendor",
                  "query": "mer"
                }
              }
            ],
            "filter": [
              {
                "equals": {
                  "value": <customer id>,
                  "path": "customer"
                }
              }
            ]
          }
        },
        "facets": {
          "tagFacet": {
            "path": "tags",
            "type": "string"
          }
        }
      }
    }
  }

Regarding $searchMeta Facets - Return only filtered elements - we do not currently support limiting facet results to the array elements which match the search criteria. If this is a requirement for you, you may want to consider using the $group aggregation stage instead. You can also open a new request in our feedback portal to support this in Atlas Search.