How to solve this NLP search scenario

Misagh_Jebeli · September 9, 2022, 1:17pm

Hi there,

We are using Atlas basic text search. We are trying to compare items from listA to listB and sometimes we get matches that do make sense by text search standards, but it really isn’t the same item in real world and shouldn’t match.

For instance:

LIST A: SAGE
LIST B: Sage Palm, Sausage, garlic.

So the result is that sage matches Palm Sage and Sausage even though they are not the same item.

I was thinking of using synonyms and building an array of all possible permutations of each item and then comparing the whole phrase against it. For instance:

GARLIC : [GARLIC POWDER, GARLIC, GARLIC SALT]
SAGE PALM: [SAGE PALM]
SAUSAGE: [SAUSAGE, SAUSAGE LINKS, CHICKEN SAUSAGE, …]

Appreciate any feedback

Elle_Shwer · September 9, 2022, 1:25pm

Hey there! Can you share the query and index definition you used for this?

Are you looking for exact matches? You may find some inspiration from this blog.

Misagh_Jebeli · September 9, 2022, 2:08pm

Hi @Elle_Shwer =,

The search is tricky. I hope these examples make sense:

it is ok for GARLIC to match GARLIC POWDER, or GARLIC OIL (and vice versa).
It is not ok for SAGO PALM to match PALM OIL (they have the world PALM in common. But SAGO PALM is a plant really poisonous for pets while PALM OIL extracted from OIL PALM tree and is not toxic)

Misagh_Jebeli · September 9, 2022, 2:11pm

Here is the aggreagate we are using:

const poisonAggregrate = (poison) => [
    {
      $search: {
        compound: {
          should: [
            {
              text: {
                query: poison,
                path: "name",
                score: {
                  boost: {
                    value: 5,
                  },
                },
              },
            },
          ],
        },
      },
    },
    {
      $limit: 1,
    },
  ];

Thank you

Elle_Shwer · September 9, 2022, 2:22pm

That’s very interesting, I suspect synonyms would be really helpful here, as you suggested. Explicit mappings specifically.

Will think about this more though and am certainly curious if anyone else in the forum has ideas.

Misagh_Jebeli · November 12, 2022, 11:47am

Hey team,

The synonym array worked. We ended up building an array for each poison item and stored all the possible combinations of the ingredient in it. Common Ingredients such as salt, garlic, and onion had 4000 to 14000 elements in their array.

Thank you Mongodb team and community for brainstorming