How to slice or limit for each array item inside $in and $match

Gaurav_Yadav · November 5, 2024, 5:17am

I am using mongodb in nodejs. I want to early select only 50 Documents per pallet number inside my pipeline first stage. How can I do it? I want to do this because given a pallet number it can have 1 to 1000 serial numbers associalted with it.

My Document schema and pipeline looks like below

{
PalletNumber: string // This might repeat
SerialNumber: string and unique 
}


const documents = await collection.aggregate([{
   $match:{
  // I want to early select only 50 Documents per unique pallet number
   PalletNumber:{ $in: PalletNumbers}
  },
 {
  $group: {_id: "PalletNumber", SerialNumbers: {$push:"$SerialNumbers"}}
 }
}])

example of documents
{
_id: A
PalletNumber: SS // This remains same
SerialNumber: 01 // this can range from 1 to 1000
},
{
_id: A
PalletNumber: SS1 // This remains same
SerialNumber: 01 // this can range from 1 to 1000
},
{
_id: A
PalletNumber: SS2 // This remains same
SerialNumber: 01 // this can range from 1 to 1000
}

John_Sewell · November 5, 2024, 12:20pm

Couple of questions…

Why do you want to do this, is it to get it running faster as it currently runs slow?
How many documents / big is the collection
How do you define when to include a Serial Number? You’re not sorting so is it the first 50 or random 50?

How about something like a $lookup onto itself with a pipeline limiting the matches?

Gaurav_Yadav · November 6, 2024, 6:35am

I am going to answer your questions

Why do you want to do this, is it to get it running faster as it currently runs slow?

I come from sql world. Honestly it is not running slow as of now. I just want to reduce the number of reads.

$match:{
  // I want to early select only 50 Documents per unique pallet number
   PalletNumber:{ $in: ["ABC1", "ABC2"]}
  },
saw this inside my monogdb nodejs query planner and this is just two pallet numbers 

{
in: 6720 // I want to reduce this number that's also
out:6720
}

As the number of pallets grow this is going to increase.

How do you define when to include a Serial Number? You’re not sorting so is it the first 50 or random 50

But in reality all i care about it Getting first 50 or 50 random pallets for each pallet number inside my match. Considering pallet numbers are always unique.

How about something like a $lookup onto itself with a pipeline limiting the matches?
So for $lookup it seems I need to create two Collection one for Pallet and one for Serial Numbers ? As of now all of this field is combined into one single Collection like below

{
_id: Objectid
PalletNumber: string // This might repeat
SerialNumber: string and unique
}

John_Sewell · November 6, 2024, 10:18am

You can join a collection onto itself, something like this:

var insertData = [];

for (var x=0; x<100;x++ ){
    var palletNumber = 'P' + Math.floor(Math.random() * 10000);
    for (var y=0;y<1000 ; y++ ){
        insertData.push({
            PalletNumber: palletNumber,
            SerialNumber: y
        })
    } 
        
}

db.getCollection("Demo").insertMany(insertData)


db.getCollection("Demo").aggregate([
{
    $match:{
        'PalletNumber':{$in:['P3072']}
    }
},
{
    $group:{
        _id:'$PalletNumber'
    }
},
{
    $lookup:{
        from:'Demo',
        let:{
            pallet_palletID:'$_id'
        },
        pipeline:[
            {
                $match:{
                    $expr:{
                        $eq:[
                            '$$pallet_palletID',
                            '$PalletNumber'
                        ]
                    }
                }
            },
            {
                $limit:50
            }
        ],
        as:'lookuips'
    }
}
])

John_Sewell · November 6, 2024, 10:21am

I’ve left out indexes above as the data is so small but you can setup an index so that the pallet grouping etc is supplied from a covered index so that query can be read directly from the index / ram for performance and then the lookup is again handled by the index so fast.

Gaurav_Yadav · November 6, 2024, 10:51am

Thanks for you help.

John_Sewell · November 6, 2024, 11:04am

No problem! Hopefully this points you in a direction that works.