Detected corrupt BSON data error when cloning a huge DB

We have a huge DB (<500GB) hosted on AWS. When we clone this and try to access it, we are getting this message - “Detected corrupt BSON data for field path ‘domain’ at offset 62 {“exception”:”[object] (MongoDB\Driver\Exception\UnexpectedValueException(code: 0): Detected corrupt BSON data for field path ‘domain’ at offset 62 at vendor/jenssegers/mongodb/src/Jenssegers/Mongodb/Query/Builder.php:410)"

The index rebuilding is failing during Mongo DB repair. This process takes days to complete. We are using Laravel and GitHub - mongodb/mongo-php-driver: The Official MongoDB PHP driver. The package suggests repairing the DB but since thats failing, we have been stuck at this for weeks. Any help would be much appreciated,

Hello @indrajith_N_A ,

Welcome to the community!! :wave:

Could you please provide more information to help us understand this issue in detail?

  • How was this clone created?
  • What version of MongoDB and driver are you using?
  • Does exporting the collection using mongoexport show the same error?
  • What’s the error you see during index building?

*Are you using mongod --repair for this and what does it says when it fails?

  • what package is this and what message does it say exactly?

Regards,
Tarun

Hi Tarun,

-Does exporting the collection using mongoexport show the same error?
We tried the export feature using MongoDB Compass.On exporting the data, we are getting an error “Invalid UTF-8 string in BSON document”. We could try out MongoExport if you think it could make any difference.

What’s the error you see during index building?

  • After a couple of days, the server crashes and the DB becomes unusable. We doubled the RAM and tried again, same happened. Will try to find out the error.

  • Are you using mongod --repair for this and what does it says when it fails?
    We did try this option. I will find out if any error messages can be dug out.

  • Package name is mentioned at the top. Error message is - Detected corrupt BSON data for field path ‘domain’. It happens when filtering some data.

Hi @indrajith_N_A

We took an AMI of the AWS instance.

If the idea is to copy a deployment from one server to another, using the tested and supported MongoDB backup methods may be better. In fact, I would recommend you try using the supported backup & restore methods to see if it results in the same error you’re seeing with the AMI process.

We could try out MongoExport if you think it could make any difference.

It would be interesting to see if mongoexport encounters the same issue. Please attach all error messages from mongoexport if this is possible.

Tarun

This is the error that we’re getting while indexing

2022-07-04T01:59:01.203+0000 I CONTROL  [initandlisten]
2022-07-04T01:59:01.229+0000 I STORAGE  [initandlisten] Expected index data is missing, rebuilding. NS: breachaware.breached_accounts Index: _id_ Ident: index-3--8744299071372410985
2022-07-04T01:59:01.229+0000 I STORAGE  [initandlisten] Expected index data is missing, rebuilding. NS: breachaware.breached_accounts Index: domain_alias_compound_index Ident: index-4--8744299071372410985
2022-07-04T01:59:01.229+0000 I STORAGE  [initandlisten] Expected index data is missing, rebuilding. NS: breachaware.breached_accounts Index: breach_id_index Ident: index-5--8744299071372410985
2022-07-04T01:59:01.229+0000 I INDEX    [initandlisten] found 2 index(es) that wasn't finished before shutdown
2022-07-04T01:59:01.229+0000 F -        [initandlisten] Fatal assertion 40592 InternalError: IndexCatalog has left over indexes that must be cleared ns: breachaware.breached_accounts at src/mongo/db/db.cpp 465
2022-07-04T01:59:01.229+0000 F -        [initandlisten]

***aborting after fassert() failure

It seems like a data corruption issue, could you help me with below?

  • Are you seeing any issues with the original Database(not the clone)? That is, in the collection breachaware.breached_accounts , are you seeing the aforementioned indexes intact and functional ( _id_ , domain_alias_compound_index , and breach_id_index ).

  • The message is typically displayed when there is disk-level data corruption. If there is no issue on the original database, and this is only present on the clone, then the clone has corrupted data.

  • Please use the supported backup & restore method for moving data between MongoDB instances , or using an initial sync on a replica set. Other methods are not supported, may cause issue with the clone/backup, and can possibly also affect the integrity of the original database if the backup method is especially invasive (e.g. not shutting down MongoDB before copying data, inadvertent modification of the dbpath while mongod is running, etc., any of them can have catastrophic consequences).

Tarun

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.