How to use a JSON file to aggregate in MongoDB in Java?

khat33b · August 20, 2021, 1:14pm

I am using the Java MongoDB driver v4.3:

implementation group: 'org.mongodb', name: 'mongodb-driver-sync', version: '4.3.1'
implementation group: 'org.mongodb', name: 'bson', version: '4.3.1'

I have my aggregation pipelines written in JSON files which are placed in the src/main/resources folder. The aggregate function only accepts a List<Bson> . After fetching the file, how do I pass it into the MongoDB method?

String fileName = "physics/test.json";
File file = new File(fileName);
MongoDatabase db = mongoClient.getDatabase(DATABASE_NAME);
MongoCollection collection = db.getCollection(collectionName);
// Convert file to List<Bson> ???
AggregateIterable sourceList = collection.aggregate(pipeline);

A sample of my JSON file pipeline is:
[{"$group":{"_id":{"segment":"$segment","invoice_id":{"$trim":{"input":"$invoice_id"}}},"qty":{"$sum":"$qty"}}}]

Jeffrey_Yemin · August 20, 2021, 2:09pm

Hi @khat33b

Something like this should work:

    String json = """
            [{"$group":{"_id":{"segment":"$segment","invoice_id":{"$trim":{"input":"$invoice_id"}}},"qty":{"$sum":"$qty"}}}]""";

    List<BsonDocument> pipeline = new BsonArrayCodec().decode(new JsonReader(json), DecoderContext.builder().build())
            .stream().map(BsonValue::asDocument)
            .collect(Collectors.toList());

    MongoClient client = new MongoClient();
    
    client.getDatabase("test").getCollection("test")
            .aggregate(pipeline).into(new ArrayList<>());

Regards,
Jeff

khat33b · August 20, 2021, 3:32pm

@Jeffrey_Yemin Is there a way to get the result of the aggregation as a Json String?

Jeffrey_Yemin · August 20, 2021, 4:20pm

There is no way for the driver to provide the results as a JSON array, but this will get you pretty close:

    import org.bson.json.JsonObject;
    // ...
    
    List<JsonObject> results = client.getDatabase("test").getCollection("test").withDocumentClass(JsonObject.class)
            .aggregate(pipeline).into(new ArrayList<>());
    
    for (JsonObject cur: results) {
        System.out.println(cur.getJson());
    }

khat33b · August 23, 2021, 10:15am

@Jeffrey_Yemin This solved this problem although I am getting a List of BsonDocuments rather than JsonObjects. But I am facing another issue that all my aggregations which ran quite fast with NodeJS are running very slowly with Java. They are taking more than two minutes whereas in Node it was a second. Could you give me some ideas as to why this is happening?

khat33b · August 23, 2021, 10:16am

It was just Document not BsonDocument.

khat33b · August 23, 2021, 10:26am

@Jeffrey_Yemin his solved this problem. But I am facing another issue that all my aggregations which ran quite fast with NodeJS are running very slowly with Java. They are taking more than two minutes whereas in Node it was a second. Could you give me some ideas as to why this is happening?

Jeffrey_Yemin · August 23, 2021, 12:12pm

Make sure you’re tacking this on to your MongoCollection:

It will change the type of the objects representing the documents in the collection.

As for the query performance, it’s unlikely that it’s caused by the driver. I’m guessing there is some subtle difference in the pipeline that is causing it. You can try running explain on the pipeline to see what the server is doing.

steevej · August 23, 2021, 12:31pm

do not forget that what you get is a cursor. if you do not iterate over the cursor, no documents are retrieved from the server. the call to into() in your java code does iterate and retrieve documents. if you publish your node code we could try to see what is the difference.

Jeffrey_Yemin · August 23, 2021, 12:53pm

I like that hypothesis, @steevej.

Thanks,
Jeff

khat33b · August 23, 2021, 1:33pm

@steevej and @Jeffrey_Yemin

I tried running it with various aggregations and I am getting the same delay. Even with a simple projection like this, it is taking around 2 minutes. Could it be related to the fact that I am using JSON and not the helper functions?

[
  {
    "$project":{
      "upc":1,
      "inventory_name":1,
      "street_date":1,
      "format":1,
      "sub_studio":1,
      "us_base":1,
      "us_srp":1,
      "disc_num":1,
      "modified_date":1
    }
  }
]

I tried running this explanation query:

Document explanation = collection.aggregate(pipeline).explain(ExplainVerbosity.EXECUTION_STATS);

List<Document> stages = explanation.get("stages", List.class);
List<String> keys = Arrays.asList("queryPlanner", "winningPlan");

for (Document stage : stages) {
    Document cursorStage = stage.get("$cursor", Document.class);
    if (cursorStage != null) {
        System.out.println(cursorStage.getEmbedded(keys, Document.class).toJson());
    }
}

The explanation for this aggregation is:

{"stage": "PROJECTION_SIMPLE", "transformBy": {"cidm_retailer_type": 1, "date_sk": 1, "extd_price": 1, "format": 1, "inventory_name": 1, "invoice_id": 1, "qty": 1, "retailer_id": 1, "retailer_name": 1, "street_date": 1, "sub_studio": 1, "upc": 1, "_id": 0}, "inputStage": {"stage": "COLLSCAN", "direction": "forward"}}