Efficient Lookup for data fetching from databases (two collections)

I have two collections, one with one million records and the other with ten million documents. Using aggregate matching and look up, I’m attempting to retrieve all collections from collection 2 with a common local-field/foreign field value in collection 1. However, it is not as efficient in terms of time. Can anyone suggest anything else I might do to get around this time limit on extensive data?

Hi @Aman_Jaiswal2, welcome to the community. :wave:
Do you have an index on the fields that you are using for $lookup? Having an index on those fields can significantly increase the performance of your pipeline.
Would it be okay for you to share sample documents from both collections?
Also, please share your aggregation pipeline with us as well.

If you have any doubts, please feel free to reach out to us.

Thanks and Regards.
Sourabh Bagrecha,
MongoDB

Hello, @SourabhBagrecha Thanks for the quick answer.

No, I don’t think I’ll use an index on any fields for the time being.
Yes, sample documents for both collections are as follows:

Collection 1 (1 Crore document )

{
    "_id": {
        "$oid": "6267950f0ef8c21faf3ac0ac"
    },
    "OwnerID": 3650185,
    "TeamID": 9697,
    "MemberID": 2308,
    "ReportsTo": 15,
    "ID": 1046414917,
    "LastMet": {
        "$date": "2017-10-12T00:00:00.000Z"
    },
    "LastUpdated": {
        "$date": "2021-03-26T00:00:00.000Z"
    },
    "LastContacted": {
        "$date": "2018-06-07T00:00:00.000Z"
    },
    "Date": {
        "$date": "2019-01-13T00:00:00.000Z"
    }
}

Collection 2 (10 Crore documents)

{
    "_id": {
        "$oid": "6257c13ce98303a96f4c8c94"
    },
    "AccountID": 61320380681,
    "AssignedToID": 297,
    "AssignedToName": "50579",
    "Caption": "8244853",
    "CreatedBy": "6697955755",
    "LastActionID": 360319,
    "MappingId": 78298773,
    "RMModifiedbyO1": "66877527",
    "Status": "7978224",
    "StatusCode": 3769242,
    "SubCategoryCCRAO1": "8493861710",
    "RelatedToTypeID": 17095,
    "Gender": "Female",
    "Age": 39,
    "Date": {
        "$date": "2020-04-28T00:00:00.000Z"
    }
}

Both these collections have a common field as ReportsTo (Collection1) and AssignedTo (Collection2).

The query I’m now executing is as follows…

my_db.aggregate([
            {'$match': {'ReportsTo': ReportsID}},
            {'$lookup':
             {
                'from': 'TestData',
                'localField': 'MemberID',
                'foreignField': 'AssignedToID',
                'as': 'Table_Test'
             }
            }
])

This is a contradiction compare to your goal of Efficient Lookup for data fetching from databases. You wrote

The very first step to increased performance has been suggested

Also have an index on the $match-ed fields.

Without indexes it will always be

To understand the importance of indexes, take MongoDB Courses and Trainings | MongoDB University.

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.