Global Faceting

I’m doing some conversion from Elasticsearch to Atlas FTS and I’ve just hit a feature I’m not sure exists yet in mongo in a streamlined way. In ES you are able to use aggregators in a “global” context meaning they are NOT influenced by the search query. This is useful in cases where you want to return search results that ARE refined by search criteria, and aggregations that are NOT influenced by the same filters within a single query.

IE I provide a search term of “Food” which returns results that contain the term “Food” in their name, as well as facet values that contain the term “Food” (Not faceted values from the results with food in their name).

The only workaround I can identify atm for such a use case is by sending 2 separate requests. One for only results, and one using $searchMeta to only get the facets so that I would have control over the filtering context of said facets.

Hope this makes sense, any feedback appreciated :slight_smile:

Hi @Luke_Snyder :wave:

Just to clarify - Would the following elastic search documentation be correct with regards to the elasticsearch global aggregations you had mentioned?

Would you mind sharing some sample documents as well as the output you achieved using this workaround? This will give me a better idea of what you’re after. Please redact any sensitive or personal information before posting here.

Look forward to hearing from you!

Regards,
Jason

Yes that is the correct elastic documentation :+1: Basically allows you to do the faceting on the entire index as a whole without being subjected to the filters present in the “search” portion of the request.

Below are some examples of the workaround and what the results end up looking like. I’ve simplified the example and removed any extraneous stuff out of it so it might not look like it works 1:1 to the code im providing you.

Sample Docs:

{
  name: "Leanne Smith"
  bio: "Test Bio",
  interests: ["Soccer", "Food"]
}

{
  name: "Mike Rob"
  bio: "Text goes here",
  interests: ["Leadership"]
}

Sample $search request:

{
        "$search": {
            "compound": {
                "should": [
                    {
                        "autocomplete": {
                            "query": "lea",
                            "path": "name"
                        }
                    },
                    {
                        "text": {
                            "query": "lea",
                            "path": "bio"
                        }
                    }
                ],
                "minimumShouldMatch": 1
            }
        }
    }

Sample $searchMeta request:

[
    {
        "$searchMeta": {
            "facet": {
                "operator": {
                    "compound": {
                        "filter": [],
                        "should": [],
                        "minimumShouldMatch": 0
                    }
                },
                "facets": {
                    "interests": {
                        "type": "string",
                        "path": "interests",
                        "numBuckets": 100
                    }
                }
            }
        }
    },
    {
        "$facet": {
            "facets": [
                {
                    "$replaceRoot": {
                        "newRoot": {
                            "interests": {
                                "$filter": {
                                    "input": "$facet.interests.buckets",
                                    "as": "interest",
                                    "cond": {
                                        "$regexMatch": {
                                            "input": "$$interest._id",
                                            "regex": "([Ll][Ee][Aa]).*|([Ll][\\. ]*[Ee][\\. ]*[Aa][\\. ]*).*|(.*[^a-zA-Z0-9][Ll][Ee][Aa].*)",
                                            "options": "i"
                                        }
                                    }
                                }
                            }
                        }
                    }
                },
                {
                    "$limit": 1
                }
            ]
        }
    }
]

Sample Output:

"hits": [
  {
    name: "Leanne Smith"
    bio: "Test Bio",
    interests: ["Soccer", "Food"]
  }
],
"facets": {
  "interests": {
    "Leadership": 1
  }
}

So, the big issue is that the autocomplete and text clauses from the $search request cannot exist in the same request that I’m faceting in. This would limit my faceted results to ONLY the documents that meet the criteria of those clauses, meaning I would never hit the interests value of Leadership.

The workaround allows me to retrieve facets for all documents or for a subset of filters that don’t include the search criteria. Then I utilize regex filtering to supply the “matches”. Although, I’m realizing now I can probably improve the second request by just making the values searchable instead of using the regex to scan the results.

Hope this makes sense, please let me know if that provides clarity.

That makes sense - thanks for clarifying and providing all those details. I also assume your sample output is the result of the 2 individual requests combined together but please correct me if I am wrong here.

Would something like below work for you as an alternative to the $searchMeta and $facet aggregation pipeline you provided? (i.e. your second request):

db>db.search.aggregate(
{
  '$searchMeta': {
    facet: {
      operator: { autocomplete: { query: 'lea', path: 'interests' } },
      facets: {
        interests: { type: 'string', path: 'interests', numBuckets: 100 }
      }
    }
  }
})

Output:

{
    count: { lowerBound: Long("1") },
    facet: {
      interests: { buckets: [ { _id: 'Leadership', count: Long("1") } ] }
    }
}

Regards,
Jason

Correct, the output I supplied is a combination of the 2 requests which we return as a single object to the user.

The suggestion you provided would work when used on a single facet field, but we are often time supplying facet results for up to 10-15 fields in a single request. So, if you adjusted your code for that, the operator would end up being a compound operarator with a bunch of should conditions spanning the various fields, and the results would be unidentifiable as to which facet bucket they belong in. So for example:


Luke_Snyder
1d
Yes that is the correct elastic documentation :+1: Basically allows you to do the faceting on the entire index as a whole without being subjected to the filters present in the “search” portion of the request.

Below are some examples of the workaround and what the results end up looking like. I’ve simplified the example and removed any extraneous stuff out of it so it might not look like it works 1:1 to the code im providing you.

Sample Docs:

{
  name: "Leanne Smith"
  bio: "Test Bio",
  interests: ["Soccer", "Food", "Basket Weaving"],
  sports: ["Basketball"],
  languages: ["English", "Spanish"]
}

{
  name: "Mike Rob"
  bio: "Text goes here",
  interests: ["Leadership"],
  sports: ["Hockey"],
  languages: ["Bavarian"]
}
{
  '$searchMeta': {
    facet: {
      operator: { 
         compound: {
            should: [
               { autocomplete: { query: 'ba', path: 'interests' } },
               { autocomplete: { query: 'ba', path: 'languages' } },
               { autocomplete: { query: 'ba', path: 'sports' } }
            ]
         }
      },
      facets: {
        interests: { type: 'string', path: 'interests', numBuckets: 100 },
        languages: { type: 'string', path: 'languages', numBuckets: 100 },
        sports: { type: 'string', path: 'sports', numBuckets: 100 }
      }
    }
  }
}

I would expect the output to end up looking like this, which contains ALL values on the matched documents for the faceted fields. Since the autocomplete is just narrowing down the document matches and the facets are returned based on the values contained in those docs. With the regex, the actual filtering is occurring on the FACET VALUES themselves after they’ve been returned. I believe that is why I did it the way I did. Please correct me if I’m wrong.

facet: {
      interests: { buckets: [
         { _id: 'Leadership', count: Long("1") } ,
         { _id: 'Basket Weaving', count: Long("1") },
         { _id: 'Soccer', count: Long("1") } ,
         { _id: 'Food', count: Long("1") } 
      ] },
      sports: { buckets: [
         { _id: 'Basketball', count: Long("1") },
         { _id: 'Hockey', count: Long("1") } 
      ] },
      languages: { buckets: [
         { _id: 'Bavarian', count: Long("1") } ,
         { _id: 'English', count: Long("1") } ,
         { _id: 'Spanish', count: Long("1") } 
      ] },
    }
1 Like

Ah yes, there is currently a feedback post with regards to this portion which you can vote for in regards to autocomplete and multiple fields.

In this case you could create another feedback post in regards to having something like the global aggregation in elastic search. I will also check with the team if there’s any other workarounds that may help here in the meantime.

Regards,
Jason

Hi Luke,

From what I know, there isn’t anything directly available in Atlas search currently that mimics / matches the global aggregation in elastic search but you could create a feedback post which includes your use case details it in which others can vote for the feature.

In terms of other workarounds perhaps $unionWith might work for you (it works using the same collection as well) but required MongoDB version 6.0 or higher.

Regards,
Jason