How can I refer to a field from a different collection to the one I'm querying?

Ozy · November 3, 2021, 2:05pm

Hi all,

I want to retrieve documents from CollectionA where the groupId fields of the documents match the groupId of a document from CollectionB. For example, say I had a document in CollectionB as such;

Collection B
    {
    	name:	"jane"
    	groupId: 	"47"
    }

I want to then retrieve all documents from collectionA that have a matching groupId field;

Collection A
    {
    	asset:	"Asset 1"
    	groupId:	"47"
    }
    {
    	asset:	"Asset 2"
    	groupId:	"47"
    }
    {
	asset:	"Asset 3"
	groupId:	"48"
    }

So the result of my query would be;

{
	asset:	"Asset 1"
	groupId:	"47"
}
{
	asset:	"Asset 2"
	groupId:	"47"
}

I’m aware this could be done with an aggregate query using $lookup, which I have done but it’s quite slow and I need to use $project to remove the user data as it joins the results.

steevej · November 3, 2021, 10:18pm

Slow is always relative and depends on some many factors. Relative to the size of your data and the characteristics of your environment.

First thing, is to share what you tried. This way if there is something obvious we can find out. And this ensure we do not work and proposed something you have already tried and are unsatisfied.

From the little information we have. I do not see any way out, except for a schema redesign, of a lookup. Unless you know that you want assets of groupId:47. In this case a match from A is sufficient. But you mentioned lookup so I guess that what you want is to find jane’s assets. Right? The lookup is then inevitable.

What are the indexes? To find jane’s assets, you at least need a index on name in the collection B. But to find jane’s assets in Collection(A), you might be better serve with a compound index on name and groupId in collection B. This can help avoiding to read B’s documents from disk. Obviously to make the lookup faster you need an index of groupId in collection A. Also a compound index on groupId and asset on Collection A together with an appropriate project may help avoid reading looked up documents from disk.

Ozy · November 4, 2021, 12:06pm

Thanks for responding,

Collection B just holds information on end users. I have an API which makes calls to this database, and Collection B ties their API credentials to their groupId, name. The full document would be something like;

Collection B
{
    name:	"jane",
    groupId: 	"47"
    apiId: "jdoe"
}

I then use this query to get documents from collection A;

qry = [

    {
        "$lookup": {
            "from": "Collection B",
            "localField": "groupId",
            "foreignField": "groupId",
            "as": "user_data"
        }
    },

    {
        "$match": {
            "user_data.apiId": "jdoe"
            "someCollectionAFieldToFilterBy" : "something" 
        }
    }
    {
        "$project": {
            "user_data": 0
        }
    }
]

This ensures that end-users can only see documents that match their groupID. I need to do this as there will be multiple teams accessing this database, each with their own assets, surveys, clients.

The reason I was looking to see if I could reference fields from other collections is that some queries would involve getting assets that bellong to particular clients. I’m making a separate collection called client which stores client information such as address, sites, id. Each client document would be quite large so embedding them within asset documents just for queries to be faster doesn’t seem like such a good idea.

If I were to query for all documents belonging to a certain client, I would need to do two lookups in one query. Is there a better way for me to limit which documents the end user can see?

John_Page · November 4, 2021, 3:21pm

Hi,
What you are doing is essentially correct although I would caution you that sort of relational design is sub-optimal. Your lookup will be slow because you are also Joining everything in Collection A to Everything in Collection B and then filtering it - if you put the $match for Collection A first then you will partly avoid that. Then you really only want to join where groupId matches and apiId=Joe - for this you want to use the other form of lookup that takes as an argument an aggregation pipeline that then does a $match to select which records to join. [https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#join-conditions-and-subqueries-on-a-joined-collection]