How can i optimize update query

Below query is taking lots of time to execute. Can someone help in optimising it.

try {
    var List = db.getCollection('XXX').distinct('xxx');
    for (var i = 0; i < List.length; i++) {
        var isb = List[i];
        var idNumber = 10001;
        try {

            db.getCollection('xxx').find({xxx: isb}).sort({xxx:1}).forEach(function(doc){
                db.getCollection('xxx').updateOne({
                    _id: doc._id
                }, {
                    $set: {
                        xxx: NumberInt(idNumber)
                    }
                });
                idNumber++;
                    printjson(`${xxx} : Success`)
                })
    
        } catch (e) {
            printjson(`${xxx} : Error Found`)
            printjson("ERR:: " + e);
        }
    }

} catch (e) {
    printjson("ERR:: " + e);
}

Bulk update and batch up.

1 Like

Thanks!. Is there any example if you can share.(considering above requirement)

Create a bulk object and add updates to it, when it gets to a certain size call execute and then reset the bulk object and repeat.

Rough pseudocode:

var batchSize = 1000;
var progressCounter = 0;
var bulkObject = db.getCollection('XXX').initializeUnorderedBulkOp();
var sourceData = db.getCollection('SourceColl').distinct('theField');
var loopCounter = 10001;

sourceData.forEach(theItem =>{
	print(`Processing: ${theItem}`)
	db.getCollection.find({'theField':theItem}, {_id:1}).sort({sortField:1}).forEach(theDoc =>{
		progressCounter++;
		bulk.find( { _id: theDoc._id } ).update( { $set: { counter: loopCounter++ } } );
		if(progressCounter % batchSize == 0){
			var results = bulk.execute();
			bulkObject = db.getCollection('XXX').initializeUnorderedBulkOp();		
		}	
	})
})

if(progressCounter % batchSize != 0){
	var results = bulk.execute();
	bulkObject = db.getCollection('XXX').initializeUnorderedBulkOp();		
}

From the looks of things you want to add an incrementing fields for each group, so if you had a collection of cars, with each car document having a brand, for each brand you want to add a field to each one that increments, i.e.

{_id:0, brand:'ford'},
{_id:1, brand:'ford'},
{_id:2, brand:'ford'},
{_id:3, brand:'VW'},
{_id:4, brand:'VW'},
{_id:5, brand:'VW'},

Would get updated to:

{_id:0, brand:'ford', newField:10001},
{_id:1, brand:'ford', newField:10002},
{_id:2, brand:'ford', newField:10003},
{_id:3, brand:'VW', newField:10001},
{_id:4, brand:'VW', newField:10002},
{_id:5, brand:'VW', newField:10003},

Is this right?

To start with you could wrap the inner loop in the bulk operation block so do one (or many depending on the batch size) server call per group.

I was actually trying to add an incrementing field to a query a while back and it did not seem trivial, but you may be able to use the windowFields operator to do the grouping and adding of the new field.

How big is the collection and what’s the grouping look like in terms of number of groups and documents per group?

1 Like

Thank you john!
var brand = [‘ford’,‘VW’];

Here we want to update seq data for the list of brands where the incorrect sequece is stored. For this we want to sort the collection brandwise by seq first and then update the seq by increment number starting with 10001.

Current Data

{_id:0, brand:'ford', seq:10001},
{_id:1, brand:'ford',seq:10002},
{_id:2, brand:'ford',seq:10002},
{_id:3, brand:'VW',seq:10001},
{_id:4, brand:'VW',seq:10001},
{_id:5, brand:'VW',seq:10003}

Expected Data

{_id:0, brand:'ford',seq:10001},
{_id:1, brand:'ford',seq:10002},
{_id:2, brand:'ford',seq:10003},
{_id:3, brand:'VW',seq:10001},
{_id:4, brand:'VW',seq:10002},
{_id:5, brand:'VW',seq:10003}

The total collection document size for the list of records where update is required is not more than 25000 records.

Excellent, so something like I put above should work, note that I have no error handling in the above script, unlike your original so you will want to probably have something in for that and we capture the return value of the update query, you can collate these as you go to have a sanity check of matched and updated records.

If you do a simple query on a test collection and then printjson the results you can see what the properties are to pull out and store.

1 Like

Thanks john! Appreciate your help!

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.