Hi,
I’m wondering if there is a way to batch the creation of a materialized view? For example, I have a collection that has 22 million records that have 83,000 unique userid’s. I want to create an MV that counts all the records associated with the unique userid’s. I have the query to that, but trying to create a view to go through all those records seems to increase the CPU and memory usage. So I wonder if I can say I want to merge 1000k at a time to limit the CPU and memory usage?
Hi @Centurion_Wood ,
Welcome to MongoDB community.
I guess you can limit a $merge command by applying a filter to process only 1000k each run and advance it as you run it
Thanks
Pavel
Thank you for the welcome @Pavel_Duchovny ! Are you suggesting adding a limit stage in my pipeline say 1000k and progress? If that is correct how would I make sure I’m getting new records each run?
Hi @Centurion_Wood ,
You can sort by _id and limit 1000 , once finished do $gt of the last _id in the batch and limit a new 1000 …
Is that possible?
Thanks
Pavel
Hmmm… That is an idea. Thinking more about it. I thnk I would need to $lookup against the collection I’m $merge into to get the last_id.
@Pavel_Duchovny Thanks for the idea.
So this is what I have so far with some of the thought you added. I did 50 has a small test. I guess I’m still not able to wrap my head around how to get the last id from the batch.
[{$limit: 50}, {$match: {
“deleted” : false
}}, {$sort: {
“_id”: 1
}}, {$group: {
“_id” : “$userId”,
“n” : {
“$sum” : 1
}
}}]
Hi @Centurion_Wood
Well I thought of a process with few steps
db.collection.aggregate([{$match: {
“deleted” : false
}}, {$sort: {
“_id”: 1
}},{$limit : 50}, {$merge : { into : collectionB ... }}]
var cursor =db.collectionB.find({},{_id :1}).sort({_id : -1}).limit(1);
var maxid = cursor.next()._id
db.collection.aggregate([$match: {
“deleted” : false, _id : {$gt : maxid}
}}, {$sort: {
“_id”: 1
}},{$limit : 50}, {$merge : { into : collectionB ... }}]
...
Something of this nature.
Thanks
Pavel
Ah ok! Let me take a look at that. That looks like that could work. Thank you @Pavel_Duchovny