How can the query targeting counters differ so much between two Mongo secondaries

Hi,

I have a cluster with 3 members. The queries for the public API go to the secondaries and the queries for the developer dashboard / management UI go to the primary node to distribute the load.

There is no other setting like “Nearest” or so to configure the read preference.

I would expect that the two secondaries have similar queries characteristics regarding load and query targeting. But it is totally different:

You can see it here: MongoDB Free Monitoring

I thought that the index might be corrupt or so but my test queries produce the same results. I make a test query and run explain on both secondaries individually.

Is there anything I can do to check what is going on here?

EDIT: I just restarted mongo-1, you might see it in the charts.
EDIT 2: The primary has not changed the last 7 days.

I also see very weird entries in the profiler:

{
    "op" : "query",
    "ns" : "SquidexContentV3.States_Contents_Published3",
    "command" : {
        "find" : "States_Contents_Published3",
        "filter" : {
            "mt" : {
                "$exists" : true
            },
            "id" : {
                "$exists" : true
            },
            "_ai" : "16a2ca1c-6871-4a8d-9c21-98d47652355c",
            "_si" : {
                "$in" : [
                    "5a050a0a-ee27-45a5-958d-39a4adfc2802"
                ]
            },
            "dl" : {
                "$ne" : true
            },
            "do.Slug.iv" : "BandeauListeAnnonce",
            "do.Site.iv" : "Amivac"
        },
        "sort" : {
            "mt" : -1,
            "id" : 1
        },
        "limit" : 200,
        "$db" : "SquidexContentV3",
        "$readPreference" : {
            "mode" : "secondary"
        },
        "lsid" : {
            "id" : UUID("2284bc5a-ca9d-4093-b1f2-ff56e12cb13a")
        },
        "$clusterTime" : {
            "clusterTime" : Timestamp(1656406366, 10),
            "signature" : {
                "hash" : BinData(0, "AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                "keyId" : 0
            }
        }
    },
    "keysExamined" : 6,
    "docsExamined" : 5,
    "hasSortStage" : true,
    "cursorExhausted" : true,
    "numYield" : 0,
    "nreturned" : 0,
    "queryHash" : "A13373FD",
    "planCacheKey" : "41B8B1E5",
    "locks" : {
        "ReplicationStateTransition" : {
            "acquireCount" : {
                "w" : 2
            }
        },
        "Global" : {
            "acquireCount" : {
                "r" : 2
            }
        },
        "Database" : {
            "acquireCount" : {
                "r" : 1
            }
        },
        "Collection" : {
            "acquireCount" : {
                "r" : 1
            }
        },
        "Mutex" : {
            "acquireCount" : {
                "r" : 1
            }
        }
    },
    "flowControl" : {

    },
    "storage" : {

    },
    "responseLength" : 257,
    "protocol" : "op_msg",
    "millis" : 556,
    "planSummary" : "IXSCAN { _si: 1, dl: 1, mt: -1 }",
    "execStats" : {
        "stage" : "CACHED_PLAN",
        "nReturned" : 0,
        "executionTimeMillisEstimate" : 0,
        "works" : 1,
        "advanced" : 0,
        "needTime" : 0,
        "needYield" : 0,
        "saveState" : 0,
        "restoreState" : 0,
        "isEOF" : 1,
        "inputStage" : {
            "stage" : "SORT",
            "nReturned" : 0,
            "executionTimeMillisEstimate" : 0,
            "works" : 7,
            "advanced" : 0,
            "needTime" : 6,
            "needYield" : 0,
            "saveState" : 0,
            "restoreState" : 0,
            "isEOF" : 1,
            "sortPattern" : {
                "mt" : -1,
                "id" : 1
            },
            "memLimit" : 104857600,
            "limitAmount" : 200,
            "type" : "simple",
            "totalDataSizeSorted" : 0,
            "usedDisk" : false,
            "inputStage" : {
                "stage" : "FETCH",
                "filter" : {
                    "$and" : [
                        {
                            "mt" : {
                                "$exists" : true
                            }
                        },
                        {
                            "_ai" : {
                                "$eq" : "16a2ca1c-6871-4a8d-9c21-98d47652355c"
                            }
                        },
                        {
                            "do.Site.iv" : {
                                "$eq" : "Amivac"
                            }
                        },
                        {
                            "do.Slug.iv" : {
                                "$eq" : "BandeauListeAnnonce"
                            }
                        },
                        {
                            "id" : {
                                "$exists" : true
                            }
                        }
                    ]
                },
                "nReturned" : 0,
                "executionTimeMillisEstimate" : 0,
                "works" : 6,
                "advanced" : 0,
                "needTime" : 5,
                "needYield" : 0,
                "saveState" : 0,
                "restoreState" : 0,
                "isEOF" : 1,
                "docsExamined" : 5,
                "alreadyHasObj" : 0,
                "inputStage" : {
                    "stage" : "IXSCAN",
                    "nReturned" : 5,
                    "executionTimeMillisEstimate" : 0,
                    "works" : 6,
                    "advanced" : 5,
                    "needTime" : 0,
                    "needYield" : 0,
                    "saveState" : 0,
                    "restoreState" : 0,
                    "isEOF" : 1,
                    "keyPattern" : {
                        "_si" : 1,
                        "dl" : 1,
                        "mt" : -1
                    },
                    "indexName" : "_si_1_dl_1_mt_-1",
                    "isMultiKey" : false,
                    "multiKeyPaths" : {
                        "_si" : [

                        ],
                        "dl" : [

                        ],
                        "mt" : [

                        ]
                    },
                    "isUnique" : false,
                    "isSparse" : false,
                    "isPartial" : false,
                    "indexVersion" : 2,
                    "direction" : "forward",
                    "indexBounds" : {
                        "_si" : [
                            "[\"5a050a0a-ee27-45a5-958d-39a4adfc2802\", \"5a050a0a-ee27-45a5-958d-39a4adfc2802\"]"
                        ],
                        "dl" : [
                            "[MinKey, true)",
                            "(true, MaxKey]"
                        ],
                        "mt" : [
                            "[MaxKey, MinKey]"
                        ]
                    },
                    "keysExamined" : 6,
                    "seeks" : 1,
                    "dupsTested" : 0,
                    "dupsDropped" : 0
                }
            }
        }
    },
    "ts" : ISODate("2022-06-28T08:52:46.742+0000"),
    "client" : "10.0.0.10",
    "allUsers" : [

    ],
    "user" : ""
}

no yields, no locks, zero results, index scan and still it takes a lot of time.

Hi @Sebastian_Stehle

Are you still seeing different metrics for query targeting lately? Since the metrics are just reporting the facts about what’s happening in the server, then I would assume that there are two different queries that happens to target the different secondaries. I believe if you have a look at the mongod logs on that timeframe, you’ll have more information about the different queries and can trace it back to the originating application.

About the profiler entry, I see that the query takes about 500ms to finish. It may not need to yield, but it returns zero results because the query predicates doesn’t match anything. However it still needs to do work to determine this fact. I don’t have enough information to say why it takes 500ms, but it’s usually due to the circumstances surrounding the server’s situation during execution. Are you seeing this query regularly in the logs? If yes, how often?

I would also note that although you can read from secondaries to distribute read load somewhat, please be careful about this since a replica set’s primary purpose is to give you high availability instead of providing scaling. If you’re distributing reads equally to the two secondaries, if one of them is offline, then the remaining secondary will be hit with double the workload instantly. If it cannot handle this surge, then the application will timeout. See Can I use more replica nodes to scale? for a more thorough discussion on this.

Best regards
Kevin