Mongod random crashes on Windows: FileRenameFailed

Good day.

First, I know they are many posts that relate to my issue, but none provided me with a fix. :frowning:

Running:
Mongo Server Community 5.0.5
Windows Server 2019

The service runs with a domain user for which we gave full control over the root path of D:\Mongo\ (in which is the data and log folder). Additionally, we’ve also setup our AV to exclude scanning within D:\Mongo\ too !

Every so often (too often!) the mongod.exe process still seems to crash with a FileRenamedFailed: Access is denied… error. Here’s a snipped of the log file:

{"t":{"$date":"2023-03-02T13:18:50.717-05:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"10.10.42.251:56707","uuid":"6a36177a-b425-400a-a1a9-1fc735f56ab0","connectionId":165612,"connectionCount":9}}
{"t":{"$date":"2023-03-02T13:18:58.738-05:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn165612","msg":"Connection ended","attr":{"remote":"10.10.42.251:56707","uuid":"6a36177a-b425-400a-a1a9-1fc735f56ab0","connectionId":165612,"connectionCount":8}}
{"t":{"$date":"2023-03-02T13:18:59.738-05:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"10.10.42.251:56884","uuid":"0c8e1898-f54c-49dd-8605-bb31d7f2b909","connectionId":165613,"connectionCount":9}}
{"t":{"$date":"2023-03-02T13:19:11.933-05:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn165613","msg":"Connection ended","attr":{"remote":"10.10.42.251:56884","uuid":"0c8e1898-f54c-49dd-8605-bb31d7f2b909","connectionId":165613,"connectionCount":8}}

{"t":{"$date":"2023-03-02T13:19:11.990-05:00"},"s":"F",  "c":"CONTROL",  "id":4757800, "ctx":"ftdc","msg":"Writing fatal message","attr":{"message":"terminate() called. An exception is active; attempting to gather more information"}}
{"t":{"$date":"2023-03-02T13:19:12.032-05:00"},"s":"F",  "c":"CONTROL",  "id":4757800, "ctx":"ftdc","msg":"Writing fatal message","attr":{"message":"DBException::toString(): FileRenameFailed: Access is denied\nActual exception type: class mongo::error_details::ExceptionForImpl<37,class mongo::AssertionException>\n"}}

{"t":{"$date":"2023-03-02T13:19:12.766-05:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"10.10.42.251:57108","uuid":"9d15f9b4-8e8a-4659-9377-a78356a0c731","connectionId":165614,"connectionCount":9}}
{"t":{"$date":"2023-03-02T13:19:12.766-05:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn165614","msg":"Connection ended","attr":{"remote":"10.10.42.251:57108","uuid":"9d15f9b4-8e8a-4659-9377-a78356a0c731","connectionId":165614,"connectionCount":8}}
{"t":{"$date":"2023-03-02T13:19:13.768-05:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"10.10.42.251:57120","uuid":"cd4b4ba0-4f96-4494-8073-7d408e924f4f","connectionId":165615,"connectionCount":9}}
{"t":{"$date":"2023-03-02T13:19:13.768-05:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn165615","msg":"Connection ended","attr":{"remote":"10.10.42.251:57120","uuid":"cd4b4ba0-4f96-4494-8073-7d408e924f4f","connectionId":165615,"connectionCount":8}}
{"t":{"$date":"2023-03-02T13:19:14.390-05:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"Checkpointer","msg":"WiredTiger message","attr":{"message":"[1677781154:390576][14408:140723038999488], WT_SESSION.checkpoint: [WT_VERB_CHECKPOINT_PROGRESS] saving checkpoint snapshot min: 5515, snapshot max: 5515 snapshot count: 0, oldest timestamp: (1677781152, 1) , meta checkpoint timestamp: (1677781152, 1) base write gen: 108733"}}
{"t":{"$date":"2023-03-02T13:19:14.770-05:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"10.10.42.251:57137","uuid":"43c1c2bf-d3c5-49d3-bb1b-4e83e16e1440","connectionId":165616,"connectionCount":9}}
{"t":{"$date":"2023-03-02T13:19:14.770-05:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn165616","msg":"Connection ended","attr":{"remote":"10.10.42.251:57137","uuid":"43c1c2bf-d3c5-49d3-bb1b-4e83e16e1440","connectionId":165616,"connectionCount":8}}

In all the posts out there, none of them resolved this crash for us:

  • Most of them related to an AV scanning files with the /data/: we’ve excluded scanning within the folder!

  • Some talk about permissions problems: we’ve given full control to the user running mongod within the root of the Mongo files!

  • I’ve even seen posts talking about a bad server locale setup (but that would be when the log shows unicode chars not processed properly or something (log would show something like {"message":"DBException::toString(): FileRenameFailed: \ufffdv\ufffd\ufffd...), but that doesn’t seem to be our case from viewing our log. Plus, our server is set with a “English” local:


I’m running out of ideas here… Upgrade to latest Mongo? But why haven’t I found anything regarding this that says you need to upgrade if that’s the case?

Any ideas would be super appreciated. Much thanks for your time folks.

Regards,
Patrick

Hi @Patrick_Roy

Sorry you’re having difficulty with this issue, but unfortunately I believe the error FileRenameFailed originated from outside the server, so it’s typically an OS level issue.

One thing I can think of is SERVER-58085, which will warn you if the path is a network drive (which is known to sometimes result in this). SERVER-28194 is another, but that was fixed a long time ago.

Since you’re running version 5.0.5 and the latest in the 5.0 series is 5.0.15, I would start by upgrading first. Upgrading to the latest version ensures that you’re not seeing a fixed issue, so it’s usually a good idea to try first.

If your dbpath is not on a network drive, and you have upgraded to 5.0.15, then perhaps the best option is to open a SERVER ticket describing the situation.

Best regards
Kevin

Hello @kevinadi. Thanks for your reply.

Our server currently randomly crashing seems to be our Arbiter (we are running with PSA). Although we’ve had the crash on another server that has only 1 instance (primary only - testing server). These 2 servers that did produce the crash all have the dbPath set to a local disk.

Our next step, since we don’t want to fall too much behind in upgrades, is to upgrade to latest Mongo 6.0.x LTS version, and hope all crashes magically goes away :wink: Although, I’m still puzzled as to why we’re getting the crash. I mean, if doing an upgrade fixes it, then I should be able to find the relevant fix that resolves the issue, but didn’t find anything yet…

Hi folks, just to share an update on this particular crash… We know that the crash would occasionally occur when Mongo renamed this file: \diagnostic.data\metrics.interim to metrics.interim.temp.

Few steps I took to try and bypass the manipulation of this file (it’s only a diagnostic / metrics file info of some kind, so not really needed (?))

  1. Upgraded our Mongo instances to MongoDB 6.0.5 Community
  2. Tried to forcefully disable free Monitoring in mongod.cfg:
    cloud:
      monitoring:
        free:
          state: off
    
  3. Tried to forcefully disable diagnostic data collection in mongod.cfg:
    setParameter:
    	diagnosticDataCollectionEnabled: false
    
  4. Finally, tried to disable Telemetry with : mongosh --nodb --eval "disableTelemetry()"

Results: it seems like it is the last point (4) that fixed the issue by disabling telemetry. I am not sure though if it’s a combination of all points that did it… But so far, it’s but up over a month without a crash (was crashing 3-4 times a month before!).

Regardless, we definitely shouldn’t need to disable all that stuff. To me, it looks like there’s a bug somewhere with specifics setup (but what?) can’t say…

Cheers! Pat

I just had the same issue with mongodb 4.4.22.

I enabled file auditing on windows and it appears Kaspersky is to blame:

Objekt:
	Objektserver:		Security
	Objekttyp:		File
	Objektname:		C:\mongo\db\diagnostic.data\metrics.interim
	Handle-ID:		0x82c
	Ressourcenattribute:	S:AI

Prozessinformationen:
	Prozess-ID:		0xcf8
	Prozessname:		C:\Program Files (x86)\Kaspersky Lab\Kaspersky Security for Windows Server\kavfswp.exe

Zugriffsanforderungsinformationen:
	Zugriffe:		Attribute schreiben