I am using mongodb in nodejs. I want to early select only 50 Documents per pallet number inside my pipeline first stage. How can I do it? I want to do this because given a pallet number it can have 1 to 1000 serial numbers associalted with it.
My Document schema and pipeline looks like below
{
PalletNumber: string // This might repeat
SerialNumber: string and unique
}
const documents = await collection.aggregate([{
$match:{
// I want to early select only 50 Documents per unique pallet number
PalletNumber:{ $in: PalletNumbers}
},
{
$group: {_id: "PalletNumber", SerialNumbers: {$push:"$SerialNumbers"}}
}
}])
example of documents
{
_id: A
PalletNumber: SS // This remains same
SerialNumber: 01 // this can range from 1 to 1000
},
{
_id: A
PalletNumber: SS1 // This remains same
SerialNumber: 01 // this can range from 1 to 1000
},
{
_id: A
PalletNumber: SS2 // This remains same
SerialNumber: 01 // this can range from 1 to 1000
}
Couple of questions…
Why do you want to do this, is it to get it running faster as it currently runs slow?
How many documents / big is the collection
How do you define when to include a Serial Number? You’re not sorting so is it the first 50 or random 50?
How about something like a $lookup onto itself with a pipeline limiting the matches?
I am going to answer your questions
Why do you want to do this, is it to get it running faster as it currently runs slow?
I come from sql world. Honestly it is not running slow as of now. I just want to reduce the number of reads.
$match:{
// I want to early select only 50 Documents per unique pallet number
PalletNumber:{ $in: ["ABC1", "ABC2"]}
},
saw this inside my monogdb nodejs query planner and this is just two pallet numbers
{
in: 6720 // I want to reduce this number that's also
out:6720
}
As the number of pallets grow this is going to increase.
How do you define when to include a Serial Number? You’re not sorting so is it the first 50 or random 50
But in reality all i care about it Getting first 50 or 50 random pallets for each pallet number inside my match. Considering pallet numbers are always unique.
How about something like a $lookup onto itself with a pipeline limiting the matches?
So for $lookup it seems I need to create two Collection one for Pallet and one for Serial Numbers ? As of now all of this field is combined into one single Collection like below
{
_id: Objectid
PalletNumber: string // This might repeat
SerialNumber: string and unique
}
You can join a collection onto itself, something like this:
var insertData = [];
for (var x=0; x<100;x++ ){
var palletNumber = 'P' + Math.floor(Math.random() * 10000);
for (var y=0;y<1000 ; y++ ){
insertData.push({
PalletNumber: palletNumber,
SerialNumber: y
})
}
}
db.getCollection("Demo").insertMany(insertData)
db.getCollection("Demo").aggregate([
{
$match:{
'PalletNumber':{$in:['P3072']}
}
},
{
$group:{
_id:'$PalletNumber'
}
},
{
$lookup:{
from:'Demo',
let:{
pallet_palletID:'$_id'
},
pipeline:[
{
$match:{
$expr:{
$eq:[
'$$pallet_palletID',
'$PalletNumber'
]
}
}
},
{
$limit:50
}
],
as:'lookuips'
}
}
])
2 Likes
I’ve left out indexes above as the data is so small but you can setup an index so that the pallet grouping etc is supplied from a covered index so that query can be read directly from the index / ram for performance and then the lookup is again handled by the index so fast.
1 Like
No problem! Hopefully this points you in a direction that works.
1 Like