Size of retrieved documents < batch size

Hello,

I have been using mongodb v1.11.1 with golang 1.19.
I was doing some data type migration, and I ran into something.

In some environment, I had to change data type over 100 000 documents. Since everything is running is pods, I couldn’t load in memory 100 000 documents so I used batchSize = 1000 in the FindOptions struct.

And I was using the method find in a infinite loop and breaking if cursor.ID == 0.
The thing is I was doing the check before using cursor.Next() so some clients had a number of documents < batchSize and the cursor was considered dead and I wasn’t updating any of their document.

Is that a wanted behavior so put the ID of cursor = 0 when number of documents is < batchSize ?

FYI, when number of documents > batchSize everything is good !

Hey @Dylan_Dinh,

Thank you for reaching out to the MongoDB Community forums!

To better understand the problem, may I ask you the following questions:

  • What specific data type migration were you performing?
  • Could you provide more details about the environment in which you had to change the data type?
  • How are the documents distributed across the pods? Are they evenly distributed or specific to certain pods?
  • Did you encounter any error messages or warnings during the migration process?
  • Have you tried any alternative approaches to handling the data type migration?
  • Could you share the code snippet that you are executing?
  • Also, please share the version of the MongoDB server you are currently using and where it is deployed.

Looking forward to hearing back from you.

Regards,
Kushagra

Hi @Kushagra_Kesav,

  • I was migrating from a object from BinData to plain text
  • It was a migration running at the start of our pod using the version I said above, what specific information do you need ?
  • One pod was in charge to do that so 100 000 documents was too much for the memory available leeding to OOM, I fixed this using batch size options.
  • No, we discovered in production that when number of documents < batchSize those documents weren’t updated.

Code snippet :

batchSize := int32(1000)
	opts := &options.FindOptions{
		BatchSize: &batchSize,
	}
	for {
		cursor, err := db.Collection(pushedNotificationCollection).Find(context.Background(), bson.M{"payload": bson.M{"$type": "binData"}}, opts)
		if err != nil {
			return err
		}

		if cursor.ID() == 0 {
			break
		}

		for cursor.Next(context.Background()) {
			var ai pushednotification.AlarmInfo
			var pn oldUserPushedNotification

			if err = cursor.Decode(&pn); err != nil {
				return err
			}

			err = json.Unmarshal(pn.Payload, &ai)
			if err != nil {
				return err
			}

			newUpn := buildNewUserPushedNotifFromOld(pn, ai)

			_, err = db.Collection(coll).ReplaceOne(context.Background(), cursor.Current, newUpn)
			if err != nil {
				return err
			}
		}
	}
	return nil

Is it the right way to do that, the fastest as possible ?
Doing that before calling ReplaceOne when number of doc < batch size will in fact break the for loop and you miss some documents :

if cursor.ID() == 0 {
			break
		}

Question is, why when number of doc < batch size then cursor.ID() == 0, I feel like it has to be to value 0 when there is no documents at all.

Maybe I shouldn’t use a for loop so I could get rid of that break call but I have memory limitation on my pod, so loading 100 000 documents is not possible.

  • db.version() → 4.4.0

Where in the documentation does it say that cursor being zero indicates that the cursor does not contain any documents?