How can I find all docs where a user has only a single doc (aggregation problem)?

Hope the title topic wasn’t too misleading, let me explain the problem a bit more:

  • we do have a collection called Matches - it always has two fields kit1 and kit2 which represent users DNA kits from another Kits collection (foreign key in SQL lingo).
  • I have a list of kits that can be in either kit1 or kit2, that can be solved in two separate queries and using $in
  • But I want to find all docs where either kit1 or kit2 is in that list of kits and there is only a single doc for this specific kit1 / kit2 combination

Not sure if anyone understands, so I better give an example:

List of kits: [‘AA’, ‘BB’, ‘CC’]
Matches docs that I’m trying to find (as their count will be === 1):
{ kit1: ‘AA’, kit2’ ABV’, chr: ‘1’ }
{ kit1: ‘XX’, kit2: ‘CC’, chr: ‘5’ }

Matches docs that should be excluded from the result (as their count > 1)"
{ kit1: ‘BB’, kit2: ‘HR’, chr: ‘8’ }
{ kit1: ‘BB’, kit2: ‘HR’, chr: ‘X’ }

I’m not even sure this can be done in MongoDb directly but would appreciate the aggregation been done there as we already have 700,000 docs in Matches and will soon cross 1 million.

Thanks in advance,

Andreas

This does what you need i think.Filter so kit1 or kit2 inside kits,
groupby kit1 kit2,keep only those that exist 1 time(the array sum count =1),
and unwind to root to have the original document.

var kits = ["AA","BB","CC"];

[
  {
    "$match": {
      "$expr": {
        "$or": [
          {
            "$in": [
              "$kit1",
              kits
            ]
          },
          {
            "$in": [
              "$kit2",
              kits
            ]
          }
        ]
      }
    }
  },
  {
    "$group": {
      "_id": {
        "kit1": "$kit1",
        "kit2": "$kit2"
      },
      "doc": {
        "$push": {
          "kit1": "$kit1",
          "kit2": "$kit2",
          "chr": "$chr"
        }
      }
    }
  },
  {
    "$match": {
      "$expr": {
        "$eq": [
          {
            "$size": "$doc"
          },
          1
        ]
      }
    }
  },
  {
    "$unwind": {
      "path": "$doc"
    }
  },
  {
    "$replaceRoot": {
      "newRoot": "$doc"
    }
  }
]

There is a solution also using less memory,on group by to not store all the document,but only the
chr but it needs the document to be contructed again later.

var kits = ["AA","BB","CC"];

[
  {
    "$match": {
      "$expr": {
        "$or": [
          {
            "$in": [
              "$kit1",
              kits
            ]
          },
          {
            "$in": [
              "$kit2",
              kits
            ]
          }
        ]
      }
    }
  },
  {
    "$group": {
      "_id": {
        "kit1": "$kit1",
        "kit2": "$kit2"
      },
      "chr": {
        "$push": "$chr"
      }
    }
  },
  {
    "$match": {
      "$expr": {
        "$eq": [
          {
            "$size": "$chr"
          },
          1
        ]
      }
    }
  },
  {
    "$project": {
      "doc": {
        "$map": {
          "input": "$chr",
          "as": "chr",
          "in": {
            "kit1": "$_id.kit1",
            "kit2": "$_id.kit2",
            "chr": "$$chr"
          }
        }
      }
    }
  },
  {
    "$unwind": {
      "path": "$doc"
    }
  },
  {
    "$replaceRoot": {
      "newRoot": "$doc"
    }
  }
]

Thank you @Takis - can I ask you one more question? Where do I identify the collection that I use in your code example?

In the first $match block it should query the Matches collection, I don’t see any reference to it.

The latter part is excellent and I didn’t know how to use these aggregation steps, thanks for that.

How to name the database and the collection depends on the driver you use.
On mongo shell you do
use yourDatabaseName
db.yourCollectionName.agreegate([…the-stages…])

Its simple in all drivers,see the driver tutorial on aggregate example
For example python
https://pymongo.readthedocs.io/en/stable/examples/aggregation.html

The code works assuming kits list is not like a very big list.

Hope it helps