Problems while using graphLookup to retrieve chained documents

Hello all. I would like to ask you for help with something.

I have a collection, let’s call it Interaction, where there are pairs of users who have interacted. For example:

[{
  "_id": {
    "$oid": "657b391695d44668a338651d"
  },
  "nameA": "john",
  "nameB": "frank"
},
{
  "_id": {
    "$oid": "657c6da195d44668a338651e"
  },
  "nameA": "martin",
  "nameB": "john"
},
{
  "_id": {
    "$oid": "657c77ed95d44668a3386521"
  },
  "nameA": "martin",
  "nameB": "albert"
},
{
  "_id": {
    "$oid": "657c7e5195d44668a3386522"
  },
  "nameA": "foo",
  "nameB": "bar"
}]

I did this pipeline:

[
  {
    $graphLookup: {
      from: "Interaction",
      startWith: "$nameA",
      connectFromField: "nameB",
      connectToField: "nameA",
      as: "chainA"
    },
  },
  {
    $graphLookup: {
      from: "Interaction",
      startWith: "$nameB",
      connectFromField: "nameA",
      connectToField: "nameB",
      as: "chainB",
    },
  },
  {
    $project: {
      group: {
        $setUnion: [
          "$chainA.nameA",
          "$chainA.nameB",
          "$chainB.nameA",
          "$chainB.nameB",
        ],
      },
    },
  },
  {
    $group: {
      _id: null,
      uniqueGroups: {
        $addToSet: "$group",
      },
    },
  }
]

My intention is to form groups with users that interacted. With the example dataset, I would expect two groups to be created:

  • albert, frank, john, martin
  • foo, bar

But instead, three groups are created:

  • albert, frank, john, martin
  • frank, john, martin
  • foo, bar

I did some tests and I suspect that it has something to do with the fact that relations are possible from nameA to nameA and nameB to nameB but also A → B and B → A.

Could you please help me with this? Why am I getting three groups?

I also tried to add more graphLookups in order to try to cover the different relation fields possible (A ->A, A->B, B->A, B->B) but I ended up getting the same result.

Thank you in advance.

You get it because you

frank is nameB in document 657b391695d44668a338651d which has john as nameA so a graph lookup starts with the values

  • startWith nameB=frank
  • connectFromField nameA=john
  • connectToField nameB

the document 657c6da195d44668a338651e is found since nameB=john so the next lookup is done with

  • connectFromField nameA=martin
  • connectToFieldName nameB

Since martin is nowhere as nameB the graph lookup stops there with frank,john and martin.

But I am surprised is that you do not get the shorter chains such as martin,john and martin,albert because $graphLookup produces them as seen in Mongo playground.

Sorry for the late response.

I ended up doing a first transformation, into something like:

{
   names: [ "nameA", "nameB"]
}

This way I’ve been able to use graphLookup and get the result I was looking for.

Thank you!

1 Like