Single Shard failure bringing entire system down

Nabil_Nalakath · March 4, 2024, 8:44am

We are running an on-prem self-hosted MongoDB community database where we have some sharded collections. I am going to simplify the system for the sake of this question.

Let’s say, we have 5 physical computers. In this 4 computers are running physical shards of the DB and a 5th computer runs the Mongo router & config service. The data stored on each shard is very specific to that shard and not random. We have defined the shard key in such a way that data generated by our application on a computer will be stored in the shard on the same computer. We can read the data stored on other shards from every system using the router service well.

The problem arises when one of the shards goes down. As soon as this happens the entire system goes down. Mongo router service is not able to return the data for the shards that are still up and running. If I connect to other shards directly using their ports, it works. But obviously, this cannot be done because we have too many to search in each shard manually.

I tried to simulate this on a smaller level. And when I tried to run

const shardedCollection = database.collection("myShardedCollection");
await shardedCollection.find().toArray();

after bring down shard 2 manually, this is the error I got

MongoServerError: Encountered non-retryable error during query :: caused by :: Could not find host matching read preference { mode: “primary” } for set shard2ReplSet

Note - Adding a replica set to ensure the shard doesn’t go down is something we have already done. But I excluded that from the problem because what I want to primarily know is if there is any way, when a shard becomes completely unavailable ( with or without replica sets) how can we still get data for other shards without any issue?

Kobe_W · March 4, 2024, 6:57pm

at that time, is the shard “shard2ReplSet” working well? is there a primary in this shard? how many replica nodes you have in each shard?

chris · March 5, 2024, 2:44am

This query will target all shards of the collection, this is why this particular query will fail.

Any shards chunks will still be available, but if a query need to query the shard that is not available then the query will fail. Also if the unavailable shard is also the primary shard for a database then all the unsharded collections of that database will be unavailable too.

This behaviour is described in sharding / high availability

Given this sharded cluster:

sh.status() (click to expand)

 sh.status()
shardingVersion
{ _id: 1, clusterId: ObjectId('65e5f0b19111036bf95cc8a6') }
---
shards
[
  {
    _id: 'rs0',
    host: 'rs0/mongo-0-a:27018',
    state: 1,
    topologyTime: Timestamp({ t: 1709568222, i: 2 })
  },
  {
    _id: 'rs1',
    host: 'rs1/mongo-1-a:27018',
    state: 1,
    topologyTime: Timestamp({ t: 1709568229, i: 2 })
  }
]
---
active mongoses
[ { '7.0.6': 1 } ]
---
autosplit
{ 'Currently enabled': 'yes' }
---
balancer
{
  'Currently enabled': 'yes',
  'Currently running': 'no',
  'Failed balancer rounds in last 5 attempts': 0,
  'Migration Results for the last 24 hours': { '1': 'Success' }
}
---
databases
[
  {
    database: {
      _id: 'anotherDB',
      primary: 'rs0',
      partitioned: false,
      version: {
        uuid: UUID('583c1ef7-36bf-487d-8f21-1ce9819d7091'),
        timestamp: Timestamp({ t: 1709603393, i: 10015 }),
        lastMod: 2
      }
    },
    collections: {}
  },
  {
    database: { _id: 'config', primary: 'config', partitioned: true },
    collections: {
      'config.system.sessions': {
        shardKey: { _id: 1 },
        unique: false,
        balancing: true,
        chunkMetadata: [ { shard: 'rs0', nChunks: 1 } ],
        chunks: [
          { min: { _id: MinKey() }, max: { _id: MaxKey() }, 'on shard': 'rs0', 'last modified': Timestamp({ t: 1, i: 0 }) }
        ],
        tags: []
      }
    }
  },
  {
    database: {
      _id: 'test',
      primary: 'rs1',
      partitioned: false,
      version: {
        uuid: UUID('4494c13a-cfe4-43a3-a071-81b0529fffe1'),
        timestamp: Timestamp({ t: 1709568305, i: 1 }),
        lastMod: 1
      }
    },
    collections: {
      'test.foo': {
        shardKey: { code: 1 },
        unique: false,
        balancing: true,
        chunkMetadata: [ { shard: 'rs0', nChunks: 1 }, { shard: 'rs1', nChunks: 1 } ],
        chunks: [
          { min: { code: MinKey() }, max: { code: 519 }, 'on shard': 'rs0', 'last modified': Timestamp({ t: 2, i: 0 }) },
          { min: { code: 519 }, max: { code: MaxKey() }, 'on shard': 'rs1', 'last modified': Timestamp({ t: 2, i: 1 }) }
        ],
        tags: []
      }
    }
  }
]

If shard rs1 is unavailable then:

Any query on test.foo where code is 519 to MaxKey will fail. (Chunks are on rs1)

example

[direct: mongos] test> db.getSiblingDB('test').foo.aggregate([{$sortByCount:'$code'}])
MongoServerError[FailedToSatisfyReadPreference]: Could not find host matching read preference { mode: "primary" } for set rs1

example

 [direct: mongos] test> db.getSiblingDB('test').foo.aggregate([{$match:{code:{$gte:519}}},{$sortByCount:'$code'}])
 MongoServerError[FailedToSatisfyReadPreference]: Could not find host matching read preference { mode: "primary" } for set rs1

Any query on unsharded collections on the test database will fail. (rs1 is the primary shard for test

example

[direct: mongos] test> db.getSiblingDB('test').bar.find({})
MongoServerError[FailedToSatisfyReadPreference]: Encountered non-retryable error during query :: caused by :: Could not find host matching read preference { mode: "primary" } for set rs1

Queries on anotherDB will succeed. (rs0 )is the primary shard for anotherDB

example

[direct: mongos] test> db.getSiblingDB('anotherDB').bar.aggregate({$sortByCount:'$f'})
[
  { _id: 0, count: 3334 },
  { _id: 1, count: 3333 },
  { _id: 2, count: 3333 }
]

Queries on test.foo where code is from MinKey and less than 519 will succeed.

example

[direct: mongos] test> db.getSiblingDB('test').foo.aggregate([{$match:{code:{$lte:518}}},{$sortByCount:'$code'}])
[
 { _id: 226, count: 11 }, { _id: 506, count: 10 },
 { _id: 208, count: 10 }, { _id: 156, count: 10 },
 { _id: 69, count: 10 },  { _id: 416, count: 10 },
 { _id: 132, count: 10 }, { _id: 9, count: 10 },
 { _id: 348, count: 10 }, { _id: 65, count: 10 },
 { _id: 323, count: 10 }, { _id: 141, count: 10 },
 { _id: 263, count: 10 }, { _id: 148, count: 10 },
 { _id: 499, count: 10 }, { _id: 288, count: 10 },
 { _id: 268, count: 10 }, { _id: 22, count: 10 },
 { _id: 442, count: 10 }, { _id: 374, count: 10 }
]
Type "it" for more

Nabil_Nalakath · March 5, 2024, 9:23am

I have mentioned that I took down the shard to replicate the possibility of the same and the same regarding replica sets. Kindly read the statement once again for clarity.

Nabil_Nalakath · March 5, 2024, 9:26am

Thanks for the answer. The problem is that I don’t know from which shards the query may need data from. What I need is a way to skip the unavailable shard and still return the data from all other shards that are available.

Say I run a query that fetches 5 records across from 5 different shards, and one of them is unavailable I still need to get the 4 other records which is not happening at the moment.