Automating fix for BSONObjectTooLarge from cluster level change stream

TopherGopher · August 8, 2022, 4:45pm

We have implemented a cluster-level change stream watcher in the golang driver to help migrate schema and migrate clusters. We’ve been touching each document’s updated date to get the stream to pick it up. Been working great. While watching, my cursor broke out with:

(BSONObjectTooLarge) Executor error during getMore :: caused by :: BSONObj size: 19665104 (0x12C10D0) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: { _data: "8262EB3576000002BB2B022C0100296E5A1004333EBB2C7745456497CD595172557DDC46645F696400645F20B948F93E9425F70F739F0004" }

I can’t re-open the change stream as the resume token is stuck on this document. The hope would be to perform the fix on the old document and re-open the change stream at that resume token. The problem is, I can’t locate the document - we have thousands of collections (one of the reasons why we’re doing this data migration project), and I can’t figure out which document, in particular, is the problem child. In production, I’d like to be able to automatically handle the error - perform a known fix of trimming an overly long array, but I can’t figure out how to get from the error message to the document that needs fixing. We need access to the full document each time for our use case, so we can’t project the large field away altogether. The calltime on the fix can be complex, so if it’s something like…using the resume token and directly querying the oplog and performing the update on the old document, that’s doable.

Does the _id._data object contain information about which database/collection/objectId this is referencing?
Is there a better method for decoding this error in the golang driver?
Let’s say we can’t decode into a document ID - is there a method for bypassing this one change and stepping to the next resume token?

Any help is appreciated.

Jannik_Schmiedl · July 19, 2023, 12:15pm

Hi, I just ran into the same problem. Wondering if you managed to solve it somehow.
Cheers