Same query fast on mongo shell, but slow on java mongo driver

KelvinYch · December 3, 2021, 3:03am

Hi, I have met an issue that querying on a large collection is really slow with java driver (~35seconds), but it can be finished almost instantly on mongo shell (~0.35second). It is really appreciated if anyone can give me some help.

The collection has around 200M rows of documents, and here is the example data

{
   “0”: 1,
   “1”: “1419291302798”,
   “2”: “1419291302798”,
   “3”: “1419291240000”,
   “4”: “Code”,
   “5”: “Invoice Number”,
   “6”: 17,
   “7”: 0,
   “8”: “Transaction Type”,
   “9”: “1”,
   “10”: “AB123:1”,
   “16”: “2014-12-22T16:00:00.000Z”,
   “_id”: “5498aaa6e4b0ba4cd3ac8c08”
}

The query on mongo shell is

db.transaction.find({
   “_id” : {
       “$gt” : ObjectId(“5498aaa4e4b0ba4cd3ac8c07”)
    }
}).projection({}).sort({_id:1}).limit(100).skip(0);

and the explain on mongo shell

   "queryPlanner":{
      "namespace":"db.transaction",
      "indexFilterSet":false,
      "parsedQuery":{
         "_id":{
            "$gt":“5498aaa4e4b0ba4cd3ac8c07”
         }
      },
      "queryHash":“CF3048B0”,
      "planCacheKey":“C17B2099”,
      "maxIndexedOrSolutionsReached":false,
      "maxIndexedAndSolutionsReached":false,
      "maxScansToExplodeReached":false,
      "winningPlan":{
         "stage":"LIMIT",
         "limitAmount":100,
         "inputStage":{
            "stage":"FETCH",
            "inputStage":{
               "stage":"IXSCAN",
               "keyPattern":{
                  "_id":1
               },
               "indexName":" *id* ",
               "isMultiKey":false,
               "multiKeyPaths":{
                  "_id":[
                     
                  ]
               },
               "isUnique":true,
               "isSparse":false,
               "isPartial":false,
               "indexVersion":2,
               "direction":"forward",
               "indexBounds":{
                  "_id":[
                     “(ObjectId(‘5498aaa4e4b0ba4cd3ac8c07’),
                     "ObjectId(‘ffffffffffffffffffffffff’)"
                  ]"”"
               ]
            }
         }
      }
   },
   "rejectedPlans":[
      
   ]
},
"command":{
   "find":"transaction",
   "filter":{
      "_id":{
         "$gt":“5498aaa4e4b0ba4cd3ac8c07”
      }
   },
   "limit":100,
   "singleBatch":false,
   "sort":{
      "_id":1
   },
   "projection":{
      
   },
   "$db":"db"
},
"serverInfo":{
   "host":"host",
   "port":27017,
   "version":“5.0.4”,
   "gitVersion":“62a84ede3cc9a334e8bc82160714df71e7d3a29e”
},
"serverParameters":{
   "internalQueryFacetBufferSizeBytes":104857600,
   "internalQueryFacetMaxOutputDocSizeBytes":104857600,
   "internalLookupStageIntermediateDocumentMaxSizeBytes":104857600,
   "internalDocumentSourceGroupMaxMemoryBytes":104857600,
   "internalQueryMaxBlockingSortMemoryUsageBytes":104857600,
   "internalQueryProhibitBlockingMergeOnMongoS":0,
   "internalQueryMaxAddToSetBytes":104857600,
   "internalDocumentSourceSetWindowFieldsMaxMemoryBytes":104857600
},
"ok":1
}

meanwhile, the java code is as followings

public List getById(String id, Integer size, Integer offset) {
    long startTime = System.currentTimeMillis();
    BasicDBObject criteria = new BasicDBObject();

    if (id != null)
        criteria.append(MongoDBUtil.key_id, new BasicDBObject(MongoDBUtil.query_greaterThan, new ObjectId(id)));

    Map<String, Integer> sortCriteria = new HashMap<>();
    sortCriteria.put(MongoDBUtil.key_id, MongoDBUtil.sortOrder_asc);

    List mongoDBObjects = mongoDBUtil.search(MongoKeys.collection_transaction, criteria, null, sortCriteria, size, offset);

    long endTime = System.currentTimeMillis();
    if (showQueryTime)
        log.debug("getByBrandIdNSortById DAO time: " + (endTime - startTime));

    List transactions = new ArrayList<>();
    if (mongoDBObjects != null && mongoDBObjects.size() > 0)
        for (MongoDBObject mongoDBObject : mongoDBObjects)

    transactions.add(pass(mongoDBObject));
    return transactions;
}

and the search function be called

public List search(String collection, Map<String, Object> criteria, Map<String, Integer> fields, Map<String, Integer> orderBy, Integer size, Integer offset) {

    BasicDBObject critObj = null;

    if (criteria != null)
        critObj = new BasicDBObject(criteria);

    DBCursor result = null;
    if (fields == null)
        result = mongo.getDB(db).getCollection(collection).find(critObj);
    else
        result = mongo.getDB(db).getCollection(collection).find(critObj, new BasicDBObject(fields));

    if (orderBy != null)
        result.sort(new BasicDBObject(orderBy));

    if (offset != null)
        result.skip(offset);
        
    if (size != null)
        result.limit(size);

    if (result != null && result.count() > 0) {
        List resultList = new ArrayList();
        for (DBObject obj : result.toArray())
            resultList.add(new MongoDBObject((BasicBSONObject) obj));
        return resultList;
    } else
        return null;
    }
}

I guess it should be running the same query as on mongo shell?

explain from the java driver

{
   "explainVersion":“1”,
   "queryPlanner":{
      "namespace":"db.transaction",
      "indexFilterSet":false,
      "parsedQuery":{
         "_id":{
            "$gt":{
               "$oid":“54ab2697e4b0210a3c109f08”
            }
         }
      },
      "maxIndexedOrSolutionsReached":false,
      "maxIndexedAndSolutionsReached":false,
      "maxScansToExplodeReached":false,
      "winningPlan":{
         "stage":"LIMIT",
         "limitAmount":10000,
         "inputStage":{
            "stage":"FETCH",
            "inputStage":{
               "stage":"IXSCAN",
               "keyPattern":{
                  "_id":1
               },
               "indexName":" *id* ",
               "isMultiKey":false
            },
            "isUnique":true,
            "isSparse":false,
            "isPartial":false,
            "indexVersion":2,
            "direction":"forward",
            "indexBounds":{
               "_id":[
                  "(ObjectId("54ab2697e4b0210a3c109f08")",
                  "ObjectId(""ffffffffffffffffffffffff"")"
               ]"
              ]
           }
        }
     }
  },
  ""rejectedPlans"":[
     
  ]
},
“executionStats”:{
“executionSuccess”:true,
“nReturned”:10000,
“executionTimeMillis”:602,
“totalKeysExamined”:10000,
“totalDocsExamined”:10000,
“executionStages”:{
“stage”:“LIMIT”,
“nReturned”:10000,
“executionTimeMillisEstimate”:575,
“works”:10001,
“advanced”:10000,
“needTime”:0,
“needYield”:0,
“saveState”:25,
“restoreState”:25,
“isEOF”:1,
“limitAmount”:10000,
“inputStage”:{
“stage”:“FETCH”,
“nReturned”:10000,
“executionTimeMillisEstimate”:575,
“works”:10000,
“advanced”:10000,
“needTime”:0,
“needYield”:0,
“saveState”:25,
“restoreState”:25,
“isEOF”:0,
“docsExamined”:10000,
“alreadyHasObj”:0,
“inputStage”:{
“stage”:“IXSCAN”,
“nReturned”:10000,
“executionTimeMillisEstimate”:6,
“works”:10000,
“advanced”:10000,
“needTime”:0,
“needYield”:0,
“saveState”:25,
“restoreState”:25,
“isEOF”:0,
“keyPattern”:{
“_id”:1
},
“indexName”:“ *id* ”,
“isMultiKey”:false,
            ""isUnique"":true,
            ""isSparse"":false,
            ""isPartial"":false,
            ""indexVersion"":2,
            ""direction"":""forward"",
            ""indexBounds"":{
               ""_id"":[
                  ""(ObjectId(""54ab2697e4b0210a3c109f08"")",
               "ObjectId(""ffffffffffffffffffffffff"")"
            ]"
               ]
            },
            ""keysExamined"":10000,
            ""seeks"":1,
            ""dupsTested"":0,
            ""dupsDropped"":0
         }
      }
   },
   ""allPlansExecution""\":[
      
   ]
},
“command”:{
“find”:“transaction”,
“filter”:{
“_id”:{
“$gt”:{
“$oid”:“54ab2697e4b0210a3c109f08”
}
}
},
“ntoreturn”:-10000,
“sort”:{
“_id”:1
}
},
“serverInfo”:{
“host”:“host”,
“port”:27017,
“version”:“5.0.4”,
“gitVersion”:“62a84ede3cc9a334e8bc82160714df71e7d3a29e”
},
“serverParameters”:{
“internalQueryFacetBufferSizeBytes”:104857600,
“internalQueryFacetMaxOutputDocSizeBytes”:104857600,
“internalLookupStageIntermediateDocumentMaxSizeBytes”:104857600,
“internalDocumentSourceGroupMaxMemoryBytes”:104857600,
“internalQueryMaxBlockingSortMemoryUsageBytes”:104857600,
“internalQueryProhibitBlockingMergeOnMongoS”:0,
“internalQueryMaxAddToSetBytes”:104857600,
“internalDocumentSourceSetWindowFieldsMaxMemoryBytes”:104857600
}
}

My spring version is 4.0.9 and mongo-java-driver 2.14.3
have tried upgrading to the latest version but had no luck to solve this problem.

I assume _id should be indexed and there is no indexing problem.

profiling log as follows

{
   "op":"command",
   "ns":"db.transaction",
   "command":{
      "count":"transaction",
      "query":{
         "_id":{
            "$gt":“54aa7804e4b0210a3c108bf8”
         }
      },
      "$db":"db",
      "lsid":{
         "id":“UUID(“2ea88fd1-7635-47c8-a8ca-a32971e49ba0”)”
      }
   },
   "keysExamined":193236588,
   "docsExamined":0,
   "numYield":193236,
   "queryHash":“650E7B4C”,
   "planCacheKey":“70FCF2DA”,
   "locks":{
      "ReplicationStateTransition":{
         "acquireCount":{
            "w":“1”
         }
      },
      "Global":{
         "acquireCount":{
            "r":“193238”
         }
      },
      "Mutex":{
         "acquireCount":{
            "r":“1”
         }
      }
   },
   "flowControl":{
      
   },
   "storage":{
      
   },
   "responseLength":45,
   "protocol":"op_msg",
   "millis":35004,
   "planSummary":"“COUNT_SCAN"{
      "_id":1
   },
   "execStats":{
      "stage":"COUNT",
      "nReturned":0,
      "executionTimeMillisEstimate":233,
      "works":193236588,
      "advanced":0,
      "needTime":193236587,
      "needYield":0,
      "saveState":193236,
      "restoreState":193236,
      "isEOF":1,
      "nCounted":193236587,
      "nSkipped":0,
      "inputStage":{
         "stage":"COUNT_SCAN",
         "nReturned":193236587,
         "executionTimeMillisEstimate":194,
         "works":193236588,
         "advanced":193236587,
         "needTime":0,
         "needYield":0,
         "saveState":193236,
         "restoreState":193236,
         "isEOF":1,
         "keysExamined":193236588,
         "keyPattern":{
            "_id":1
         },
         "indexName":" *id* ",
         "isMultiKey":false,
         "multiKeyPaths":{
            "_id":[
               
            ]
         },
         "isUnique":true,
         "isSparse":false,
         "isPartial":false,
         "indexVersion":2,
         "indexBounds":{
            "startKey":{
               "_id":“54aa7804e4b0210a3c108bf8”
            },
            "startKeyInclusive":false,
            "endKey":{
               "_id":"ffffffffffffffffffffffff"
            },
            "endKeyInclusive":true
         }
      }
   },
   "ts":"“2021-12-02T11":"19":19.880Z”,
   "client":"IP",
   "allUsers":[
      
   ],
   "user":"
}

The machine is running on PC with specs

CPU : 5600X
Memory : 32GB RAM
HDD : WD BLUE 6TB 5400RPM

It is really only happening on my java spring application, and it’s really appreciated if anyone can give me some hints. thanks a lot and sorry for the long post

steevej · December 3, 2021, 3:28am

The queries are different.

shell:

limit(100)
...
“stage”: “LIMIT”,
“limitAmount”: 100,

java:

// We do not know the value of the variable size
result.limit(size);

// but it is not 100
“stage”:“LIMIT”,
“limitAmount”:10000,

KelvinYch · December 3, 2021, 3:31am

sorry, it is my typo when conducting the explain, updated to limit 10000
still much faster than in java (~0.2 seconds)

{
  "explainVersion": "1",
  "queryPlanner": {
    "namespace": "db.transaction",
    "indexFilterSet": false,
    "parsedQuery": {
      "_id": {
        "$gt": "5498aaa4e4b0ba4cd3ac8c07"
      }
    },
    "queryHash": "CF3048B0",
    "planCacheKey": "C17B2099",
    "maxIndexedOrSolutionsReached": false,
    "maxIndexedAndSolutionsReached": false,
    "maxScansToExplodeReached": false,
    "winningPlan": {
      "stage": "LIMIT",
      "limitAmount": 10000,
      "inputStage": {
        "stage": "FETCH",
        "inputStage": {
          "stage": "IXSCAN",
          "keyPattern": {
            "_id": 1
          },
          "indexName": "_id_",
          "isMultiKey": false,
          "multiKeyPaths": {
            "_id": []
          },
          "isUnique": true,
          "isSparse": false,
          "isPartial": false,
          "indexVersion": 2,
          "direction": "forward",
          "indexBounds": {
            "_id": [
              "(ObjectId('5498aaa4e4b0ba4cd3ac8c07'), ObjectId('ffffffffffffffffffffffff')]"
            ]
          }
        }
      }
    },
    "rejectedPlans": []
  },
  "command": {
    "find": "transaction",
    "filter": {
      "_id": {
        "$gt": "5498aaa4e4b0ba4cd3ac8c07"
      }
    },
    "limit": 10000,
    "singleBatch": false,
    "sort": {
      "_id": 1
    },
    "projection": {},
    "$db": "db"
  },
  "serverInfo": {
    "host": "host",
    "port": 27017,
    "version": "5.0.4",
    "gitVersion": "62a84ede3cc9a334e8bc82160714df71e7d3a29e"
  },
  "serverParameters": {
    "internalQueryFacetBufferSizeBytes": 104857600,
    "internalQueryFacetMaxOutputDocSizeBytes": 104857600,
    "internalLookupStageIntermediateDocumentMaxSizeBytes": 104857600,
    "internalDocumentSourceGroupMaxMemoryBytes": 104857600,
    "internalQueryMaxBlockingSortMemoryUsageBytes": 104857600,
    "internalQueryProhibitBlockingMergeOnMongoS": 0,
    "internalQueryMaxAddToSetBytes": 104857600,
    "internalDocumentSourceSetWindowFieldsMaxMemoryBytes": 104857600
  },
  "ok": 1
}

KelvinYch · December 3, 2021, 3:32am

and the size should be 10000 and offset is 0 on java case

steevej · December 3, 2021, 3:41am

When you do

in the shell, you do not really get all 10000 documents. You only get the first batch. May be 20 documents or something like that. The last printed line is something like Type “it” for more.

KelvinYch · December 3, 2021, 3:58am

Yes, I have doubt is it faster because it is cursor and tested with the followings code

var cursors = db.transaction.find({
    "_id": {
        "$gt": ObjectId("5498aaa4e4b0ba4cd3ac8c07")
    }
}).projection({}).sort({ _id: 1 }).limit(10000).skip(0);

while (cursors.hasNext()) {
   print(tojson(cursors.next()));
}

However, it is much slower but still faster than in java(13 seconds)

and if without printing to output will be shortened into 1.4s

btw, thank you for giving me advice, I appreciate your help very much!

KelvinYch · December 3, 2021, 4:09am

sorry it is a wrong screencap

steevej · December 3, 2021, 3:57pm

Queries are executed by the server. A big difference of performance between one driver and another one is necessarily because you do something different between the 2 despite the fact that you think you do the same thing.

I already pointed you at the limit difference 100 vs 10000. Big difference. You came back with a change that makes the 2 drivers closer to each other.

I also mentioned that you were not processing any document from the nodejs driver. The two drivers got even closer with your update.

Let see what may cause more difference.

The _id used in your $gt is different from case to case. You have

Despite the fact that there is IXSCAN you may experience differences as one might already be in memory and the other has to be fetched from disk.

In your java code you are calling toArray() which puts all documents in a new array and then you created another new array by duplicating all the documents. That is extra work compared to js. In js you do not create 10000 element arrays.

KelvinYch · December 7, 2021, 4:10am

but I think this might not be the case, since,
even though I change both limits to 100, 10 or 3
java driver is still significantly slower than nodejs.

steevej · December 7, 2021, 1:54pm

Please read 4. Working with the JIT Compiler - Java Performance: The Definitive Guide [Book].

Give more details, how many iterations do you run?

Run is insufficient and detrimental to JIT as explained in previous link. You are testing for performance, then execute many time before asserting than one is slower or faster.
With same data or different data? Numbers for only 1 data point is insufficient. Especially, if the said data is in cache or note. Numbers with only 1 data point is insufficient because in real life you will have multiple data point. So you have to run multiple iterations with multiple data sets.
Do you shutdown the server between your JS and Java test? Having a working set in memory is very important for performance. If you do the java test before js test, then java test is hit by having to fetch data from disk and then js test has no fetch penalty.

You worry too much about the performance difference between java and js because in the end real life performance are influenced more by your schema including indexes and hardware. Both java and js are interpreted. If you think language make such a big difference in a database application, then consider rust, go, c or c++. But remember that the hard work is supposed to be done by the server.

I have doubts about your test setup. I am sure you will come with some corrections like you did for all the other factors I mentioned in my other posts. But that is not the point.

Choose the language you are the most comfortable with and address performance issues when and if they appear. Make it right then make if fast.