Device Sync does not fully start due to error BSONObjectTooLarge

We created a realm/app service which fails to start with a BSONObjectTooLarge exception, which is very puzzling since Atlas is supposed to restrict the documents size to 16MB already. Why would Sync fail? Also, how would we identify which document is the root cause as we have an exiting database with many many collections and documents?

Synchronization between Atlas and Device Sync has been stopped, due to error:
failed to register trigger after 10 attempts: recoverable event subscription error encountered: (BSONObjectTooLarge) PlanExecutor error during aggregation :: caused by :: BSONObj size: 17206212 (0x1068BC4) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: { _data: “826386F28C000000172B022C0100296E5A1004433CBC7ED98647F188BBE90FB8FA55E2463C5F6964003C42543248000004” }

This is unfortunately a long-standing limitation with MongoDB Change Streams where the limit of a document is 16 MB but the limit for a Change Event is also 16 MB. This means that if you have a document that is say 14 MB and you update a large field (say 4 MB), then the change event will contain the PreImage (14 MB) and the Update Description (4 MB) and the Change Event will be 18MB.

It seems like it should be possible to just “skip” this event, but unfortunately due to the way that MongoDB handles these, it is actually impossible to move past this event without the possibility of skipping events. Additionally, we cant be sure which object caused this issue (since we cant see the full event), so we would “lose” those changes and could end up corrupting data by applying changes to stale objects.

The MongoDB server team is in the process of fixing this, and once they have completed the work we will enable users who are on the most recent version of MongoDB (likely 7.0) to be able to move past this error.

One thing that could be helpful is to identify which documents are causing this issue. I have found this query to be helpful for identifying large documents in a collection:

	{ $project: { max: { $bsonSize: "$$ROOT" } } },
	{ $sort: { max: -1} },
	{ $limit: 10 },

I was able to decode the resume token you posted which has the following information:

    "clusterTime": Timestamp(1669788300, 23),
    "clusterTimeReadable", "2022-11-30T06:05:00Z",
    "version": 1,
    "tokenType": "EventToken",
    "txnOpIndex": 0,
    "fromInvalidate": false,
    "uuid": "433cbc7e-d986-47f1-88bb-e90fb8fa55e2",
    "eventIdentifier": {"_id":"BT2H"}

My bet would be that the object causing the issue is {"_id":"BT2H"} (cant find which namespace though). It is not for certain this is the document causing issues, but it is very likely that it is.