Repair fatal problem

Hello , anyone is here? Yesterday, our StandAlone mongo was killed by system and cannot restart. So I try to remove the mongod.lock and repair. But I got bellow problems. It seems I cannot continues to repair the data.
I will appreciate if anyone can check the error and give me some advises.

About the data_analysis.test_for_data_flow

the day before yesterday : db.find({time:{$…}}).forEach(function(x){db.test_for_data_flow.insert(x)}) => test_for_data_flow.find().count() = 40000000
yesterday : execute ~.aggregate([{$group:{_id:["$source","$target"]},sum:{$sum:1}}])

here is the log about test_for_data_flow before system kill the mongo.I guess the Repair Error is about the this index

Welcome to the MongoDB Community Forums @Jake_Hou !

Please provide some more details on your deployment:

  • specific version of MongoDB server (x.y.z)

  • O/S version

Some further questions:

  • Do you know why the mongod process was terminated? For example, is “killed by system” referring to something like the Out-Of-Memory (OOM) Killer" on Linux.

  • Are there any additional log messages prior to the unexpected shutdown? If you can provide in text format instead of images, that would be helpful.

  • Did the repair process indicate errors in any collections or indexes aside from test_for_data_flow?

  • Can you elaborate on the repair process you followed?

  • Did you take a backup of your dbPath contents before removing lock files or repairing?

Your second post suggests that an index build was in progress when the mongod process was restarted, and it looks like this index was affected by the unexpected mongod shutdown.

Regards,
Stennie

1 Like

Thanks Stennie, you are very kind and sorry reply late, I am a littlt bit busy.

Please check below info,

  • Mongo Version : V4.4.6
  • O/S : CentOS 7.6
  • Terminated : OOM , I use dmesg and get the info " Out of memory: Kill process 121839 (mongod) score 496 or sacrifice childstrong text "
  • Log Text about Test_for_data_flow :
    {"t":{"$date":"2021-11-19T15:34:24.686+08:00"},"s":"I",  "c":"INDEX",    "id":20438,   "ctx":"conn327586","msg":"Index build: registering","attr":{"buildUUID":{"uuid":{"$uuid":"820a0f1e-ed49-4b93-8c79-cab46040dafa"}},"namespace":"data_analysis.test_for_data_flow","collectionUUID":{"uuid":{"$uuid":"11c09284-614d-467f-81af-3de7a93012ff"}},"indexes":1,"firstIndex":{"name":"source_1"}}}
    {"t":{"$date":"2021-11-19T15:34:27.280+08:00"},"s":"I",  "c":"INDEX",    "id":20384,   "ctx":"IndexBuildsCoordinatorMongod-4","msg":"Index build: starting","attr":{"namespace":"data_analysis.test_for_data_flow","buildUUID":null,"properties":{"v":2,"key":{"source":1.0},"name":"source_1"},"method":"Hybrid","maxTemporaryMemoryUsageMB":200}}
  • Repair Process: just enter;

    mongod --dbpath /data/bk.mongo/ --directoryperdb --wiredTigerDirectoryForIndexes --repair

  • Backup: I took a backup of dbPath after repairing fault.

I thought all problem is from the unexcepted shutdown.

I found unexcepted shutdown again , and I restarted mongo, the log like below(i guess it is repair automatically even no --repair)

{"t":{"$date":"2021-11-24T13:47:03.969+08:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn253368","msg":"Connection ended","attr":{"remote":"127.0.0.1:44132","connectionId":253368,"connectionCount":86}}
{"t":{"$date":"2021-11-24T13:54:22.251+08:00"},"s":"I",  "c":"CONTROL",  "id":23285,   "ctx":"main","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"}
{"t":{"$date":"2021-11-24T13:54:22.740+08:00"},"s":"W",  "c":"ASIO",     "id":22601,   "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
{"t":{"$date":"2021-11-24T13:54:22.760+08:00"},"s":"I",  "c":"NETWORK",  "id":4648601, "ctx":"main","msg":"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set tcpFastOpenServer, tcpFastOpenClient, and tcpFastOpenQueueSize."}
{"t":{"$date":"2021-11-24T13:54:22.761+08:00"},"s":"I",  "c":"STORAGE",  "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":162546,"port":27017,"dbPath":"/data/mongodb","architecture":"64-bit","host":"10-0-16-11"}}
{"t":{"$date":"2021-11-24T13:54:22.761+08:00"},"s":"I",  "c":"CONTROL",  "id":23403,   "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"4.4.6","gitVersion":"72e66213c2c3eab37d9358d5e78ad7f5c1d0d0d7","openSSLVersion":"OpenSSL 1.0.1e-fips 11 Feb 2013","modules":[],"allocator":"tcmalloc","environment":{"distmod":"rhel70","distarch":"x86_64","target_arch":"x86_64"}}}}
{"t":{"$date":"2021-11-24T13:54:22.761+08:00"},"s":"I",  "c":"CONTROL",  "id":51765,   "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"CentOS Linux release 7.9.2009 (Core)","version":"Kernel 3.10.0-1160.el7.x86_64"}}}
{"t":{"$date":"2021-11-24T13:54:22.761+08:00"},"s":"I",  "c":"CONTROL",  "id":21951,   "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"config":"/etc/mongod.conf","net":{"bindIp":"0.0.0.0","port":27017,"unixDomainSocket":{"enabled":false}},"processManagement":{"timeZoneInfo":"/usr/share/zoneinfo"},"security":{"authorization":"enabled"},"storage":{"dbPath":"/data/mongodb","directoryPerDB":true,"journal":{"enabled":true},"wiredTiger":{"engineConfig":{"directoryForIndexes":true}}}}}}
{"t":{"$date":"2021-11-24T13:54:22.762+08:00"},"s":"W",  "c":"STORAGE",  "id":22271,   "ctx":"initandlisten","msg":"Detected unclean shutdown - Lock file is not empty","attr":{"lockFile":"/data/mongodb/mongod.lock"}}
{"t":{"$date":"2021-11-24T13:54:22.803+08:00"},"s":"I",  "c":"STORAGE",  "id":22270,   "ctx":"initandlisten","msg":"Storage engine to use detected by data files","attr":{"dbpath":"/data/mongodb","storageEngine":"wiredTiger"}}
{"t":{"$date":"2021-11-24T13:54:22.804+08:00"},"s":"W",  "c":"STORAGE",  "id":22302,   "ctx":"initandlisten","msg":"Recovering data from the last clean checkpoint."}
{"t":{"$date":"2021-11-24T13:54:22.804+08:00"},"s":"I",  "c":"STORAGE",  "id":22315,   "ctx":"initandlisten","msg":"Opening WiredTiger","attr":{"config":"create,cache_size=128357M,session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress,compact_progress],"}}
{"t":{"$date":"2021-11-24T13:54:23.767+08:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1637733263:767777][162546:0x7f913d312bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 4585 through 4616"}}
{"t":{"$date":"2021-11-24T13:54:24.767+08:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1637733264:767606][162546:0x7f913d312bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 4586 through 4616"}}
{"t":{"$date":"2021-11-24T13:54:26.034+08:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1637733266:33992][162546:0x7f913d312bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 4587 through 4616"}}
{"t":{"$date":"2021-11-24T13:54:27.347+08:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1637733267:347907][162546:0x7f913d312bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 4588 through 4616"}}
{"t":{"$date":"2021-11-24T13:54:28.554+08:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1637733268:554838][162546:0x7f913d312bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 4589 through 4616"}}
{"t":{"$date":"2021-11-24T13:54:29.838+08:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1637733269:838371][162546:0x7f913d312bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 4590 through 4616"}}
{"t":{"$date":"2021-11-24T13:54:31.109+08:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1637733271:109756][162546:0x7f913d312bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 4591 through 4616"}}
{"t":{"$date":"2021-11-24T13:54:32.324+08:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1637733272:324505][162546:0x7f913d312bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 4592 through 4616"}}
{"t":{"$date":"2021-11-24T13:54:33.539+08:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1637733273:539825][162546:0x7f913d312bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 4593 through 4616"}}
{"t":{"$date":"2021-11-24T13:54:34.872+08:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1637733274:872655][162546:0x7f913d312bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 4594 through 4616"}}
{"t":{"$date":"2021-11-24T13:54:36.068+08:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1637733276:68860][162546:0x7f913d312bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 4595 through 4616"}}
{"t":{"$date":"2021-11-24T13:54:37.276+08:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1637733277:276850][162546:0x7f913d312bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 4596 through 4616"}}
{"t":{"$date":"2021-11-24T13:54:38.675+08:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1637733278:675409][162546:0x7f913d312bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 4597 through 4616"}}
{"t":{"$date":"2021-11-24T13:54:40.023+08:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1637733280:23160][162546:0x7f913d312bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 4598 through 4616"}}
{"t":{"$date":"2021-11-24T13:54:41.234+08:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1637733281:234019][162546:0x7f913d312bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 4599 through 4616"}}
{"t":{"$date":"2021-11-24T13:54:42.456+08:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1637733282:455999][162546:0x7f913d312bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 4600 through 4616"}}

About unexpected mongod shutdown, I checked the log, mongo is killed by kernel. In our mongo.conf, we don’t set the cacheSizeGB, so we set cacheSizeGB now and wanna check the status, will it work in you exeperience ? Any more suggestions?

Thanks very much, Stennie.

Regards,
Jake

sorry , i don’t know the format well, the reply looks confusion.