Missing files using GridFS

We’re using GridFS to store files such as PDFs, Word documents etc. We’ve got a cluster setup and our app stores documents using a GUID for the filename. We’ve recently noticed some documents are missing when trying to retrieve them. I’ve looked at the code and it looks resilient in that once a document is uploaded a Find request is made to it, to ensure it was uploaded and returned to the calling function.

Initially I thought it maybe a network issue, so I induced a connection problem by simply turning off the instance of MongoDB connected with the app during the upload of a file which caused an exception as expected which causes the document not to be uploaded at all.

In the case of the missing documents we know they have been uploaded because there are references to the GUID from another record which is held in another database which is created after the document is uploaded to MongoDB.

Any ideas what it could be? I wonder if it’s a replication issue or the data is getting corrupted, could these potentially be causes for the missing documents?

Did a replica set election occurred between the time you have your confirmation (using your find request) and the time you discovered that some files are missing? It could be that the oplog was not propagated to the secondary that became primary. This should not happen if you use write majority and you should since you have implemented some kind of mechanism to ensure it is written. Write majority is a safer mechanism.

You would have more problem that with specific documents.

So if you used majority writes or if there was no election and there is no issues except a few documents, then the most logical explication is that someone or something deleted the documents.

2 Likes