Team,
We have issue with one of lower environment, deployed community version with 4.2.3. The server crashes when the IOWait% increases there is spikes in CPU as well. Once log file shows for every write insert takes roughly more than 30000ms to 40000ms. The read query for count with indexed scan takes roughly 20000ms. Sometime the write operation and read operation execution happens with 0ms. During spike in IOwait%, we see another warning message with server status is very slow. We don’t see much memory and CPU utilization but the IOWait% continuous to increase and server becomes unresponsive. We discussed with dev team to optimize the read and write operation, but i am not sure if its disk issue/slow queries/RAM addition.
Following are the iostat snapshot.
%iowait: 25 mongod process: 0.3 cpu% | 2.2 mem%
%iowait: 12 mongod: %cpu: 0.3 | %ram: 2.2
%iowait 37.48 mongod: %cpu: .3 | %ram: 2.2
Insert Query
2020-06-25T12:13:39.322+0000 I COMMAND [conn64] command ns.collecitionname command: insert { insert: "collectionname", ordered: true, $db: "xxx" } ninserted:1 keysInserted:19 numYields:0 reslen:45 locks:{ ParallelBatchWriterMode: { acquireCount: { r: 1 } }, ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { w: 1 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { w: 1 } }, Mutex: { acquireCount: { r: 2 } } } flowControl:{ acquireCount: 1 } storage:{ data: { bytesRead: 15367, timeReadingMicros: 81871983 } } protocol:op_msg 81872ms
read Query
2020-06-24T14:02:52.424+0000 I COMMAND [conn25] command ns.collectionanme command: count { count: "collectionanme", query: { header.eventId: "da03d290-32ca-45ce-a3fb-0262b0ad96f2", _class: { $in: [ "com.charter.serviceactivation.milestone.model.Event" ] } }, limit: 1, $db: "MileStones" } planSummary: IXSCAN { header.eventId: 1 } keysExamined:0 docsExamined:0 numYields:0 queryHash:4DDDD3A7 planCacheKey:51072020 reslen:45 locks:{ ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 1 } } } storage:{ data: { bytesRead: 14345, timeReadingMicros: 30327760 } } protocol:op_msg 30327ms
2020-06-25T16:00:18.740+0000 I COMMAND [ftdc] serverStatus was very slow: { after basic: 0, after asserts: 0, after connections: 0, after electionMetrics: 0, after extra_info: 0, after flowControl: 0, after globalLock: 0, after locks: 0, after logicalSessionRecordCache: 0, after network: 0, after opLatencies: 0, after opReadConcernCounters: 0, after opcounters: 0, after opcountersRepl: 0, after oplogTruncation: 0, after repl: 0, after security: 0, after storageEngine: 0, after tcmalloc: 0, after trafficRecording: 0, after transactions: 0, after transportSecurity: 0, after twoPhaseCommitCoordinator: 0, after wiredTiger: 0, at end: 74739 }
Highly appreciated if someone help on this issue.