Change stream race causes reversing and mixing of updates

I have decentralized application working on several nodes (with sharded collections) and I’m using mongo change-streams to gather inserts/updates for statistics etc.
During some testing I caught a case where the change-stream updates are mixed (and reverse).

The test:

  1. inserts a new document with a field called size with value 5
  2. overwrites the same document with size 3 (and saving old_size as 5 in another field)
  3. “delete” the document by updating the deleted field (when a document is created, the value is 0 and in this update becomes 1)

In the change stream I expect 3 rows:

  1. insert operation
  2. update operation with the size/old_size changed
  3. update operation with the deleted field changed

On the other hand, this is what I get: (removed and changed some fields to show only what is relevant)

{'operationType': 'insert', 'clusterTime': Timestamp(1655982833, 4), 'fullDocument': {'_id': 'testobjbar', 'created_at': '1655982833.69878', 'deleted': 0, 'old_size': 0, 'size': 5}, 'ns': {'db': 'test', 'coll': 'col_test'}, 'documentKey': {'_id': 'testobjbar'}}
{'operationType': 'update', 'clusterTime': Timestamp(1655982835, 1), 'ns': {'db': 'test', 'coll': 'col_test'}, 'documentKey': {'_id': 'testobjbar'}, 'updateDescription': {'updatedFields': {'created_at': '1655982835.02931', 'old_size': 5, 'size': 3}, 'removedFields': []}, 'fullDocument': {'_id': 'testobjbar', 'created_at': '1655982835.02931', 'deleted': 1, 'old_size': 0, 'retain_until': 0.0, 'size': 3}}
{'operationType': 'update', 'clusterTime': Timestamp(1655982835, 2), 'ns': {'db': 'test', 'coll': 'col_test'}, 'documentKey': {'_id': 'testobjbar'}, 'updateDescription': {'updatedFields': {'deleted': 1, 'old_size': 0}, 'removedFields': []}, 'fullDocument': {'_id': 'testobjbar', 'created_at': '1655982835.02931', 'deleted': 1, 'old_size': 0, 'retain_until': 0.0, 'size': 3}}

As seen above:

  1. The document is inserted to db
  2. In the second row, instead of seeing data related only to the overwrite (size change) we can also observe that the deleted flag changed to 1 in the fullDocument fields, meaning that the row includes the overwrite and the subsequent deletion change that was done right after
  3. Then only the last row in the change stream indicates that deleted update occurred as part of the updatedFields

Some points to observe:

  1. I don’t make deletion as long as the document is already marked as deleted, and don’t make any operation on deleted documents, so I never expect to see a deleted document twice in a row in the change stream.
  2. I know that there might be races between fullDocument and updatedFields so I try to take updates from updatedFields and if they don’t exist we take them from fullDocument
  • Is this a know issue? I didn’t see any other related post
  • How can I solve this to have operations in order in the change-stream? or what is a valid technique to identify this?

To compare, if I run the test line by line and wait, I get good results in the change stream:

{'operationType': 'insert', 'clusterTime': Timestamp(1655984572, 3), 'fullDocument': {'_id': 'testobjbar', 'created_at': '1655984572.96063', 'deleted': 0, 'old_size': 0, 'retain_until': 0.0, 'size': 5}, 'ns': {'db': 'test', 'coll': 'col_test'}, 'documentKey': {'_id': 'testobjbar'}}
{'operationType': 'update', 'clusterTime': Timestamp(1655984581, 1), 'ns': {'db': 'test', 'coll': 'col_test'}, 'documentKey': {'_id': 'testobjbar'}, 'updateDescription': {'updatedFields': {'created_at': '1655984581.21706', 'old_size': 5, 'size': 3}, 'removedFields': []}, 'fullDocument': {'_id': 'testobjbar', 'created_at': '1655984581.21706', 'deleted': 0, 'old_size': 5, 'retain_until': 0.0, 'size': 3}}
{'operationType': 'update', 'clusterTime': Timestamp(1655984591, 2), 'ns': {'db': 'test', 'coll': 'col_test'}, 'documentKey': {'_id': 'testobjbar'}, 'updateDescription': {'updatedFields': {'deleted': 1, 'old_size': 0}, 'removedFields': []}, 'fullDocument': {'_id': 'testobjbar', 'created_at': '1655984581.21706', 'deleted': 1, 'old_size': 0, 'retain_until': 0.0, 'size': 3}}

What you observed might be the following:

The fullDocument document represents the most current majority-committed version of the updated document. The fullDocument document may vary from the document at the time of the update operation depending on the number of interleaving majority-committed operations that occur between the update operation and the document lookup.

Which is an extract from

also of interest is

3 Likes

thanks a lot for the response!

so you are saying that for example:

  1. delete changes must be taken only if its observed in the updatedFields, but better be ignored if shown in fullDocument?
  2. size updates should be taken from both? I had some time where only old_size was in the updated field, etc.

Or is it always preferable to look only in the updatedFields section to lookup for updates and ignore fullDocument?

Only you can answer that. It depends of your use-cases. The documentation is clear about the fact that the fullDocument may differ. What ever you do you have to do it by taking into account that the fullDocument may differ.