"shard version not ok: version mismatch detected for"

Good afternoon,

MongoDB server version: 3.6.16 on RHEL/CentOS.

I am trying to insert a record into a sharded collection, and I am getting a strange error I have Googled the heck out of. To summarize, the insert says: “shard version not ok: version mismatch detected for MYDATABASE.MYCOLLECTION”. What causes this? I have tried to do a flushRouterConfig with no avail.

The following is the log line I that I have changed to protect the innocent:

2020-07-22T16:23:45.805+0000 I COMMAND [conn1189005] command MYDATABASE.MYCOLLECTION command: insert { insert: “MYCOLLECTION”, bypassDocumentValidation: false, ordered: false, documents: 50, shardVersion: [ Timestamp(57023, 3), ObjectId(‘57c5ff81724c2e70c623e733’) ], lsid: { id: UUID(“fade92f1-0993-4db7-af03-1b6066628f8c”), uid: BinData(0, 30D9CB0F31D33F7912528ADD7F28D77AA3ADBBDF1B6E9C50BFB8163217CE97C8) }, $clusterTime: { clusterTime: Timestamp(1595435024, 596), signature: { hash: BinData(0, E82B2EB1A3E65A615144ABFD585CC0CB8DB8E2D0), keyId: 6817327097028018329 } }, $client: { driver: { name: “mongo-java-driver”, version: “3.9.1” }, os: { type: “Linux”, name: “Linux”, architecture: “amd64", version: “3.10.0-1127.el7.x86_64” }, platform: “Java/Oracle Corporation/1.8.0_181-b13", mongos: { host: “queryrouter:27018”, client: “xxx.xxx.xxx.xxx:49626", version: “3.6.16” } }, $configServerState: { opTime: { ts: Timestamp(1595435023, 177), t: 4071 } }, $db: “MYDATABASE” } ninserted:0 exception: shard version not ok: version mismatch detected for MYDATABASE.MYCOLLECTION ( ns : MYDATABASE.MYCOLLECTION, received : 57023|3||57c5ff81724c2e70c623e733, wanted : 57024|3||57c5ff81724c2e70c623e733 ) code:StaleConfig numYields:0 reslen:14363 locks:{ Global: { acquireCount: { r: 4, w: 2 } }, Database: { acquireCount: { r: 1, w: 2 } }, Collection: { acquireCount: { r: 1, w: 2 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 1432380 } } } protocol:op_msg 1432ms

it’s always on the same object ID, and we have over 2000-3000 errors daily. Mongo cannot find the objectID:

mongos> db.MYCOLLECTION.find({_id : 
ObjectId("57c5ff81724c2e70c623e733" });

Also running below and restarting all the query routers did not fix it:

db.adminCommand({ flushRouterConfig: “MYDATABASE.MYCOLLECTION”});

Please assist.

Hello @William_Crowell, welcome to the community!

It would be helpful for others to read and respond to your question if you could apply the proper log formatting to your post. Please add spacing to break up long blocks of text/logs and improve readability.
You may want to review the Getting Started guide form @Jamie which has some great additional tips and information.

Concerning your actual problem: it seems that your run different versions.

You can check the shards mongodb.log and look for something like:

Cheers,
Michael

1 Like

Hey Michael,

Thanks for your reply. I just added a new line after each line to improve the readability, and I am providing a public gist link to it:

https://gist.githubusercontent.com/wcrowell/e29eb4b98d1a1fd5e78ca713876813b6/raw/649f01af9d16d523271147de6b43e219ee5270ec/formatted.txt

Let me know if I can do anything else to improve readability.

I did not see the message: “requested shard version differs from config shard version for my_db.my_collection, requested version is”.

Regards,

Bill Crowell

I just checked last one week of the Mongo log. We do not have “requested shard version differs”

Hi @William_Crowell and @Fory_Horio,

The errors indicate that at some point the sharding metadata was stale/out of sync.

Usually invalidation the cache on the mongos/shards should solve this:

  • However, where to run it depends on the version of the cluster.

Can you confirm the issue is gone now?

Best regards,
Pavel

Pavel,
Good morning. Do you mean running: db.adminCommand({ flushRouterConfig: " PM_AUDIT.AUDIT " } )

Or another method to invalidate the cache like this: https://docs.mongodb.com/manual/reference/command/invalidateUserCache/

Thanks for your reply.

Regards,

Bill Crowell

Hi @William_Crowell,

Yes I meant flushRouterConfig.

Yes you can start with the collection level and escalate

Best regards
Pavel

Pavel,

Thanks for your reply again. This would need to be run on both the query routers, mongod database instances, and mongod configuration database instances?

Regards,

William Crowell

We have ran the db.adminCommand({ flushRouterConfig: " PM_AUDIT.AUDIT " } ) on all the QRTs and restarted mongos several times, but the error never disappear. A few months ago, we had a similar problem, at that time, flushing and restarting mongos cured the problem. But not this time. We tried several times over few days.

Hi @Fory_Horio

Have you run it on all mongos instances and shards?

What is the version of the cluster and the sharding distribution of this collection?

Can you consider failover the shards?

Best regards
Pavel

Yes, I did, several times on all the QRTs on all shards. Didn’t get solved.

We have two shards with two QRTs for each shard, total four QRTs. I tried one more round of flushing and restarting mongos against all four QRTs. The last error BEFORE the restarts is blow. We will see.

2020-07-27T21:24:19.964+0000 I COMMAND [conn1221010] command PM_AUDIT.AUDIT command: insert { insert: “AUDIT”, bypassDocumentValidation: false, ordered: false, documents: 50, shardVersion: [ Timestamp(60823, 3), ObjectId(‘57c5ff81724c2e70c623e733’) ], lsid: { id: UUID(“0cc98ccd-4699-48fd-b418-dac54bf66319”), uid: BinData(0, 30D9CB0F31D33F7912528ADD7F28D77AA3ADBBDF1B6E9C50BFB8163217CE97C8) }, $clusterTime: { clusterTime: Timestamp(1595885058, 260), signature: { hash: BinData(0, 7469BFFD3F6BAB1F6127C9839A57F6ADFC72AB01), keyId: 6817327097028018329 } }, $client: { driver: { name: “mongo-java-driver”, version: “3.9.1” }, os: { type: “Linux”, name: “Linux”, architecture: “amd64”, version: “3.10.0-1127.el7.x86_64” }, platform: “Java/Oracle Corporation/1.8.0_181-b13”, mongos: { host: “monqrt-east-1b:27018”, client: “10.1.2.121:48620”, version: “3.6.16” } }, $configServerState: { opTime: { ts: Timestamp(1595885057, 177), t: 4071 } }, $db: “PM_AUDIT” } ninserted:0 exception: shard version not ok: version mismatch detected for PM_AUDIT.AUDIT ( ns : PM_AUDIT.AUDIT, received : 60823|3||57c5ff81724c2e70c623e733, wanted : 60824|3||57c5ff81724c2e70c623e733 ) code:StaleConfig numYields:0 reslen:14363 locks:{ Global: { acquireCount: { r: 4, w: 2 } }, Database: { acquireCount: { r: 1, w: 2 } }, Collection: { acquireCount: { r: 1, w: 2 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 1214048 } } } protocol:op_msg 1214ms

After flushing and restarting ALL QRTs, the error came back again.

2020-07-27T23:16:53.597+0000 I COMMAND [conn1221486] command PM_AUDIT.AUDIT command: insert { insert: “AUDIT”, bypassDocumentValidation: false, ordered: false, documents: 50, shardVersion: [ Timestamp(60855, 3), ObjectId(‘57c5ff81724c2e70c623e733’) ], lsid: { id: UUID(“694c2e56-c025-48e9-9e60-e71248a44bd3”), uid: BinData(0, 30D9CB0F31D33F7912528ADD7F28D77AA3ADBBDF1B6E9C50BFB8163217CE97C8) }, $clusterTime: { clusterTime: Timestamp(1595891811, 296), signature: { hash: BinData(0, 786285F2A69138520A346EBF33FA7B3CFEA96DD2), keyId: 6817327097028018329 } }, $client: { driver: { name: “mongo-java-driver”, version: “3.9.1” }, os: { type: “Linux”, name: “Linux”, architecture: “amd64”, version: “3.10.0-1127.el7.x86_64” }, platform: “Java/Oracle Corporation/1.8.0_181-b13”, mongos: { host: “monqrt-east-1c:27018”, client: “10.1.3.191:41186”, version: “3.6.16” } }, $configServerState: { opTime: { ts: Timestamp(1595891811, 3), t: 4071 } }, $db: “PM_AUDIT” } ninserted:0 exception: shard version not ok: version mismatch detected for PM_AUDIT.AUDIT ( ns : PM_AUDIT.AUDIT, received : 60855|3||57c5ff81724c2e70c623e733, wanted : 60854|3||57c5ff81724c2e70c623e733 ) code:StaleConfig numYields:0 reslen:14363 locks:{ Global: { acquireCount: { r: 7, w: 3 } }, Database: { acquireCount: { r: 2, w: 3 } }, Collection: { acquireCount: { r: 2, w: 2, W: 1 }, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 1549896 } } } protocol:op_msg 1994ms

Hi @Fory_Horio,

It seems like you might have a performance or locking issue, which can be a result of overloaded balancing or sharding resources.

Those are best covered by MongoDB support. I suggest you to engage with support.

If you wish I can contact you with a sales representative to continue the investigation.

Thanks
Pavel