Cannot bring the server up

Hello,

I’m about to start with Lab: Shard a collection, yet, when I try to bring the 1st mongod cluster (the m103-repl), for every mongod that I try to activate, it gets stuck (screenshot attached).
image

It doesn’t fork the process and also, it doesn’t allow me to continue on the same terminal. If I open a new terminal, i check the running processes and I find this:
image
And I only ran once the mongod with mongod-repl-1.conf file.

I imagine that this is happening because it is under the mongos service, but when I run mongos with its configuration file, the same happens.

Then, I ran a standard mongod session and it worked allright, so I imagine that this is happening because of the configuration files.

Thank you for your help.

Enrique.

mongod-repl-1.conf

net:
  port: 27001
  bindIp: localhost,192.168.103.100
security:
  authorization: enabled
  keyFile: /var/mongodb/pki/m103-keyfile
storage:
  dbPath: /var/mongodb/db/1
  wiredTiger:
    engineConfig:
      cacheSizeGB: .1
systemLog:
  path: /var/mongodb/db/mongod1.log
  destination: file
  logAppend: true
operationProfiling:
  mode: slowOp
  slowOpThresholdMs: 50
processManagement:
  fork: true
replication:
  replSetName: m103-repl
sharding:
  clusterRole: shardsvr

mongos.conf

sharding:
  configDB: m103-csrs/192.168.103.100:26001,192.168.103.100:26002,192.168.103.100:26003
security:
  keyFile: /var/mongodb/pki/m103-keyfile
net:
  bindIp: localhost,192.168.103.100
  port: 26000
systemLog:
  destination: file
  path: /var/mongodb/db/mongos.log
  logAppend: true
processManagement:
  fork: true

There is something really wrong with your setup.

You have 3 mongod running with the same configuration file. That is they all access the same directory, they listen to the same port.

Provide the output of ss -tlnp.

Also include the last 30 lines of the log file.

I just pasted one of the configuration files, each file has its own port and directory.

I’ve ran a stable replica set, and also a stable configuration server set, plus the mongos.

But today that I want to bring everything back up, when I begin with the first mongod of the first replica, it fails.

Here is the screenshoot of cat/var/mongodb/db/mongod1.log

By looking at it, I can see that the problem is that it is not able to get connected to the other servers, which of course, are down.

So my question is, how can I bring everything back up, once the mongod’s are configured to be in a replica, and also connected to a Configuration set, with a mongos?

Thank you

I am sorry but this is not the case. Your ps output shows 3 mongod processes with the same config file.

That’s important

By looking at it carefully it looks like it is the process trying to fork.

By the error message in the log, it looks like it is the log of mongos. Because mongos knows about csrs while mongod does not. And from the output of ps the mongod for csrs are not running. Even mongos was not running.

More of an FYI:

PID = Process ID
PPID = Parent Process ID

They are sub-processes spawned by mongod indicating that the sub-processes haven’t completed (or are in a waiting state) for some reason. This is probably happening because it’s waiting for the Config server to come back online. Once the sub-processes complete, it would typically end up with one process with a PPID of 1.

1 Like

Here is the

ss -tlnp

This is the end of the

mongos.log

That is what gets my attention. I run only once the following command:

mongod --config /vagrant/shared/mongod-repl-1.conf

The process gets frozen without releasing the terminal and I get 2 to 3 processes running with the same execution line as above.

So considering what 007_jb says, this mongod is trying to connect with the whole set and that is the reason why it doesn’t finish loading.

But my question is, how can I bring up the whole sets when the servers are configured this way?

By trying to reach the other servers, it just loops in.

This replica set was a shard before you shut it down. So if you try to bring up the replica set again, the config server needs to be up and running first otherwise it will keep waiting.

NB: Even though it’s waiting indefinitely, it’s actually connected. Just hit Ctrl+D to come out of it and you will be able to connect to this node. I’ve mostly seen this happening with Shards (i.e. when a config server is involved), there are however a few other non-shard cases.

2 Likes

That was the solution. I first need to bring up the Config servers. I didn’t tried this before, because I didn’t had the idea of checking the Log.

Than you for your time.

Closing this thread as the issue has been resolved.