Troubleshoot or Stop Long-Running Shard Migration

I dug up my old repo and I made a new version of my “quick start” sharded cluster with docker.

I used this to create a 3 shards cluster to play with and in the result of my sh.status() command I get this:

[...]
shards
[
  {
    _id: 'shard1',
    host: 'shard1/mongod-s1-1:27018,mongod-s1-2:27018,mongod-s1-3:27018',
    state: 1,
    topologyTime: Timestamp({ t: 1670635728, i: 1 })
  },
  {
    _id: 'shard2',
    host: 'shard2/mongod-s2-1:27018,mongod-s2-2:27018,mongod-s2-3:27018',
    state: 1,
    topologyTime: Timestamp({ t: 1670635729, i: 1 })
  },
  {
    _id: 'shard3',
    host: 'shard3/mongod-s3-1:27018,mongod-s3-2:27018,mongod-s3-3:27018',
    state: 1,
    topologyTime: Timestamp({ t: 1670635729, i: 7 })
  }
]
[...]

Without too much of a surprise, I’m getting {state: 1} on my 3 shards… So there is definitely something wrong in here and I bet this is our issue.

Can you please make SURE that these 3 nodes are actually running with this option ON at the moment? Maybe these startup scripts have been updated after the last restart of these 3 machines and the mongod have a large uptime.

To check you can run this on your first RS:

shard1 [direct: primary] config> use local
switched to db local
shard1 [direct: primary] local> db.startup_log.find()
[
  {
    _id: 'mongod-s1-1-1670635632853',
    hostname: 'mongod-s1-1',
    startTime: ISODate("2022-12-10T01:27:12.000Z"),
    startTimeLocal: 'Sat Dec 10 01:27:12.853',
    cmdLine: {
      net: { bindIp: '*' },
      replication: { replSet: 'shard1' },
      sharding: { clusterRole: 'shardsvr' }
    },
    pid: Long("1"),
    buildinfo: {
      version: '6.0.3',
      gitVersion: 'f803681c3ae19817d31958965850193de067c516',
      modules: [],
      allocator: 'tcmalloc',
      javascriptEngine: 'mozjs',
      sysInfo: 'deprecated',
      versionArray: [ 6, 0, 3, 0 ],
      openssl: {
        running: 'OpenSSL 1.1.1f  31 Mar 2020',
        compiled: 'OpenSSL 1.1.1f  31 Mar 2020'
      },
      buildEnvironment: {
        distmod: 'ubuntu2004',
        distarch: 'x86_64',
        cc: '/opt/mongodbtoolchain/v3/bin/gcc: gcc (GCC) 8.5.0',
        ccflags: '-Werror -include mongo/platform/basic.h -ffp-contract=off -fasynchronous-unwind-tables -ggdb -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -fno-omit-frame-pointer -fno-strict-aliasing -O2 -march=sandybridge -mtune=generic -mprefer-vector-width=128 -Wno-unused-local-typedefs -Wno-unused-function -Wno-deprecated-declarations -Wno-unused-const-variable -Wno-unused-but-set-variable -Wno-missing-braces -fstack-protector-strong -fdebug-types-section -Wa,--nocompress-debug-sections -fno-builtin-memcmp',
        cxx: '/opt/mongodbtoolchain/v3/bin/g++: g++ (GCC) 8.5.0',
        cxxflags: '-Woverloaded-virtual -Wno-maybe-uninitialized -fsized-deallocation -std=c++17',
        linkflags: '-Wl,--fatal-warnings -pthread -Wl,-z,now -fuse-ld=gold -fstack-protector-strong -fdebug-types-section -Wl,--no-threads -Wl,--build-id -Wl,--hash-style=gnu -Wl,-z,noexecstack -Wl,--warn-execstack -Wl,-z,relro -Wl,--compress-debug-sections=none -Wl,-z,origin -Wl,--enable-new-dtags',
        target_arch: 'x86_64',
        target_os: 'linux',
        cppdefines: 'SAFEINT_USE_INTRINSICS 0 PCRE_STATIC NDEBUG _XOPEN_SOURCE 700 _GNU_SOURCE _FORTIFY_SOURCE 2 BOOST_THREAD_VERSION 5 BOOST_THREAD_USES_DATETIME BOOST_SYSTEM_NO_DEPRECATED BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS BOOST_ENABLE_ASSERT_DEBUG_HANDLER BOOST_LOG_NO_SHORTHAND_NAMES BOOST_LOG_USE_NATIVE_SYSLOG BOOST_LOG_WITHOUT_THREAD_ATTR ABSL_FORCE_ALIGNED_ACCESS'
      },
      bits: 64,
      debug: false,
      maxBsonObjectSize: 16777216,
      storageEngines: [ 'devnull', 'ephemeralForTest', 'wiredTiger' ]
    }
  }
]

And here in my example I confirmed that I have sharding: { clusterRole: 'shardsvr' } in my cmdLine.

Note that all the above outputs are from MongoDB 6.0.3 and it’s definitely time for an update on your sharded clusters! :wink:

Cheers,
Maxime.