Batch Materialized View

Centurion_Wood · June 22, 2021, 6:53pm

Hi,

I’m wondering if there is a way to batch the creation of a materialized view? For example, I have a collection that has 22 million records that have 83,000 unique userid’s. I want to create an MV that counts all the records associated with the unique userid’s. I have the query to that, but trying to create a view to go through all those records seems to increase the CPU and memory usage. So I wonder if I can say I want to merge 1000k at a time to limit the CPU and memory usage?

Pavel_Duchovny · June 23, 2021, 6:43am

Hi @Centurion_Wood ,

Welcome to MongoDB community.

I guess you can limit a $merge command by applying a filter to process only 1000k each run and advance it as you run it

Thanks
Pavel

Centurion_Wood · June 23, 2021, 3:42pm

Thank you for the welcome @Pavel_Duchovny ! Are you suggesting adding a limit stage in my pipeline say 1000k and progress? If that is correct how would I make sure I’m getting new records each run?

Pavel_Duchovny · June 23, 2021, 4:37pm

Hi @Centurion_Wood ,

You can sort by _id and limit 1000 , once finished do $gt of the last _id in the batch and limit a new 1000 …

Is that possible?

Thanks
Pavel

Centurion_Wood · June 23, 2021, 5:54pm

Hmmm… That is an idea. Thinking more about it. I thnk I would need to $lookup against the collection I’m $merge into to get the last_id.

Centurion_Wood · June 23, 2021, 6:36pm

@Pavel_Duchovny Thanks for the idea.

So this is what I have so far with some of the thought you added. I did 50 has a small test. I guess I’m still not able to wrap my head around how to get the last id from the batch.

[{$limit: 50}, {$match: {
“deleted” : false
}}, {$sort: {
“_id”: 1
}}, {$group: {
“_id” : “$userId”,
“n” : {
“$sum” : 1
}
}}]

Pavel_Duchovny · June 24, 2021, 5:16am

Hi @Centurion_Wood

Well I thought of a process with few steps

db.collection.aggregate([{$match: {
“deleted” : false
}}, {$sort: {
“_id”: 1
}},{$limit : 50}, {$merge : { into : collectionB ... }}]


var cursor =db.collectionB.find({},{_id :1}).sort({_id : -1}).limit(1);

var maxid = cursor.next()._id

db.collection.aggregate([$match: {
“deleted” : false, _id : {$gt : maxid}
}}, {$sort: {
“_id”: 1
}},{$limit : 50}, {$merge : { into : collectionB ... }}]

...

Something of this nature.

Thanks
Pavel

Centurion_Wood · June 24, 2021, 2:49pm

Ah ok! Let me take a look at that. That looks like that could work. Thank you @Pavel_Duchovny