Slow simple limited query in MongoDb java driver with significant overhead in the returning result

There is a significant overhead between the time a simple mongo query with projection that is reported to have been executed and it’s conversion and return to the application with mongodb java driver. After quite a lot of searching and debugging, I managed to isolate part of the performance problem.

Concrete example

A query (extracted from log of the driver) {"find": "concepts_6523793785566", "filter": {"_id": {"$in": ["_pt2_product-instance_34_06", "_pt2_product_globals-instance_34_06"]}}, "projection": {"_id": 1, "typeClass": 1, "concepts": 1} executes (indication of mondgodb java driver) in 2.37 ms (which is slow, given that the collection has only 4 documents. See machine specs below, but similar results where obtain in other machines, inclusive Mac’s). But the query ( MongoCursor<Document> cursor = blder.cursor(); ) is encapsulated with Spring Stopwatch and it reports that it executed in 5.14 ms.

Note with the database profiler on the mongodb server with level 2 active, it reports the query executed in 0 ms, so the problem doesn’t seem to be in the actual database. Anyway the main focus of this issue is the approximately 2 a 3 ms between when the database returns the result and it is returns to the caller. This query will be executed thousands of time as it is used to simulate the graphlookup. We cannot use graphlookup from Mongodb due to its memory constrains and it’s inability to limit what is loaded.

Also that the query via mongo client returns a org.bson.Document in order to try and avoid the noise of conversion of POJO. In the real world example, an POJO representing the return is used and access is done with spring data (Repository and MongoTemplate)

I can provide a reduced version of our code that reproduces the problem.

Can anyone suggest why there is this overhead and how to reduce it. Or point me to where I may find this information.

It seem that there is a problem with the deserialization of result to bson document.

Output of executions (results of example version included)

The output of execution test should be something like (it is a bit verbose as it activates various logs from various components)

Starting application with args [[]]( 0)

... standard start of a spring boot ....

2021-09-07 12:15:10.802  INFO 884 --- [           main] c.n.t.t.CliRunnerTestMongoClient         : Starting run [0]
2021-09-07 12:15:10.906  INFO 884 --- [           main] c.n.t.t.CliRunnerTestMongoClient         : Ended run [0]

... Ignore run 0 as it includes opening connections ...

... 
        All tests run have the same layout:
        Start run [?]
        Logs from mongodb driver
        Resume of execution (via Spring stopwatch) of the various steps)
        Ended run [?]
        
        Below is an example of run[3].
...

2021-09-07 12:15:10.921  INFO 884 --- [           main] c.n.t.t.CliRunnerTestMongoClient         : Starting run [3]
2021-09-07 12:15:10.922 TRACE 884 --- [           main] org.mongodb.driver.connection            : Checked out connection [connectionId{localValue:3, serverValue:6}] to server localhost:27017
2021-09-07 12:15:10.923 DEBUG 884 --- [           main] org.mongodb.driver.protocol.command      : Sending command '{"find": "concepts_6523793785566", "filter": {"_id": {"$in": ["_pt2_product-instance_34_06", "_pt2_product_globals-instance_34_06"]}}, "projection": {"_id": 1, "typeClass": 1, "concepts": 1}, "comment": "testDirectMongoClientMultiWithProjectionsNoMapping", "$db": "testperf", "lsid": {"id": {"$binary": {"base64": "Rbl2Tk9vRxmnhVMX8X/wyw==", "subType": "04"}}}}' with request id 10 to database testperf on connection [connectionId{localValue:3, serverValue:6}] to server localhost:27017
2021-09-07 12:15:10.925 DEBUG 884 --- [           main] org.mongodb.driver.protocol.command      : Execution of command with request id 10 completed successfully in 2.37 ms on connection [connectionId{localValue:3, serverValue:6}] to server localhost:27017
2021-09-07 12:15:10.926 TRACE 884 --- [           main] org.mongodb.driver.connection            : Checked in connection [connectionId{localValue:3, serverValue:6}] to server localhost:27017
2021-09-07 12:15:10.927  WARN 884 --- [           main] c.n.t.t.CliRunnerTestMongoClient         : Run3 Slow query time StopWatch 'Run3': running time = 5146400 ns
---------------------------------------------
ns         %     Task name
---------------------------------------------
000345800  007%  Run3 Get initial information
000019500  000%  Run3 Build query - Get Collection
000005700  000%  Run3 Build query - Find
000002000  000%  Run3 Build query - Projection with comment
004767400  093%  Run3 Open cursor
000006000  000%  Run3 Read data from cursor.

2021-09-07 12:15:10.927  WARN 884 --- [           main] c.n.t.t.CliRunnerTestMongoClient         : Run3 Slow query time . Max 3 ms got 5 ms
2021-09-07 12:15:10.927  INFO 884 --- [           main] c.n.t.t.CliRunnerTestMongoClient         : Ended run [3]

... Closing down the application ...

Sample of data

{
  "_id": "fb25ecb9-7721-4864-b3c5-7439fad0180a",
  "typeClass": "CONCEPT_COLLECTION",
  "name": {
    "data": "Descrição de um acontecimento não obrigatório",
    "override": false,
    "_class": "com.nau21.metadata.domain.model.pojo.SimpleOverridableValue"
  },
  "label": {
    "data": "Descrição de um acontecimento não obrigatório",
    "override": false,
    "_class": "com.nau21.metadata.domain.model.pojo.SimpleOverridableValue"
  },
  "conceptType": {
    "data": [
      "CONCEPT"
    ],
    "override": false
  },
  "concepts": {
    "data": [
      {
        "referenceType": "ConceptRef",
        "uuid": "4207a37c-188c-4599-ad2c-0cd1697ff71e",
        "_class": "com.nau21.metadata.domain.model.mongodb.ConceptRefRecord"
      },
      {
        "referenceType": "ConceptRef",
        "uuid": "eb2cceb7-ca1d-42f7-9158-243e435fac4e",
        "_class": "com.nau21.metadata.domain.model.mongodb.ConceptRefRecord"
      },
      {
        "referenceType": "ConceptRef",
        "uuid": "0dfec7c0-0f73-4666-aad4-86674d3b151c",
        "_class": "com.nau21.metadata.domain.model.mongodb.ConceptRefRecord"
      }
    ],
    "override": false,
    "_class": "com.nau21.metadata.domain.model.mongodb.jackson.MixinConcept$SimpleOverridableValueListOrderedConceptRefData"
  },
  "basedOn": {
    "referenceType": "ConceptRef",
    "uuid": "98202924-5693-47a4-86be-e0a802679d22"
  },
  "definedIn": {
    "data": [],
    "override": false,
    "_class": "com.nau21.metadata.domain.model.pojo.SimpleOverridableValue"
  },
  "extraData": {
    "precalculated": {
      "areas": {
        "kindOfs": {
          "kindOfs": [
            {
              "referenceType": "ConceptRef",
              "uuid": "fb25ecb9-7721-4864-b3c5-7439fad0180a"
            },
            {
              "referenceType": "ConceptRef",
              "uuid": "98202924-5693-47a4-86be-e0a802679d22"
            },
            {
              "referenceType": "ConceptRef",
              "uuid": "5ab9800a-7411-408f-93f1-f8fb833b3419"
            }
          ],
          "_class": "com.nau21.metadata.domain.model.mongodb.PrecalculatedKindOfsRecord"
        }
      },
      "_class": "com.nau21.metadata.domain.model.mongodb.SimplePrecalculatedAreas"
    }
  },
  "_class": "com.nau21.metadata.domain.model.mongodb.ConceptRecord"
}

Test Machine Spec

Operating system - Windows 10 Pro 64-bit Version: 19043.1165
Microprocessor - Intel(R) Core™ i7-10750H CPU @ 2.60GHz 6 core
System memory - 32 GB
Monogodb java driver - 4.3.1
Mongodb server - MongoDB 4.4.4 Community

MongoDB server is running in docker container (linux), using WSL2.

2 Likes

Hi Paulo_nunes_de_bastos

I’m also using spring data mongoDB and came across similar issue of executing query and converting result to POJO, and have been watching this issue to see if some can answer this question.

In my case my code looks like this

AggregationResult Result= mongotemplate.aggregate(aggregationQuery, collectionName, MyClass.class)

When I run the aggregation query in studio3T it returns 500k results with each document has less than 10 fields,
In less than 2 seconds.

When I run this from my application it takes 1 to 2 minutes.

My query is Very simple, a match stage followed by a project stage

Match stage is using index provided on the collection.

After a lot of research I’m also trying to see how to make this reading of data in Java application using spring data mongodb driver faster.

Hope to see some one responds here.

Hi Paulo, there are a lot of moving parts here.

If you want to measure how fast a query run on a Java driver that runs on non-precompile mode (which likely is yours - if that’s HotSpot JVM which is the most commonly used), the fair measure would be to warm up the JVM by running many queries in a loop, e.g. 10000 times, before you start measuring how many milliseconds it runs for a single query.
Measure like 1000 times, and take the average.

JVM is usually fast, but also usually slow the first time it runs due to by default it runs as a bytecode interpreter (which is slower), and only when it hits certain threshold, the JVM will halt the execution and compile that part into native code, after which it will be much faster.

You might as well be measuring how long the JVM actually performing the (one-time) class loading, or probably how long it stops and compile from bytecode to native. By warming up the JVM, it will ensure that the class loading, the compilation, and all other overheads are excluded from your measurement.

The first loading of the class may cause the 2-3 ms slowness you experience, because the class loader might also load some other class dependencies.

Also if the code doesn’t follow best practices and produce heap memory leakage (e.g. keep allocating objects), you may also experience some stop world GC which will saturate your measurement.

Hi Rakesh, thank you for the comment.

The issue you stated may stem from the same issue, only that with Spring and probably the ORM framework might amplify the JVM overhead, and with Spring there will be more moving parts, and more overheads.
Follow the same guideline above when measuring (warm up the JVM first).

I would suggest however, to measure the performance of the pure Java driver first before measuring the performance of your whole stack that uses Spring. That way you will know exactly whether the ORM framework produces the high overhead.

Another thing to watch, make sure that the code doesn’t perform login operation each time, and instead uses the Java driver connection pooling.