How to use a JSON file to aggregate in MongoDB in Java?

I am using the Java MongoDB driver v4.3:

implementation group: 'org.mongodb', name: 'mongodb-driver-sync', version: '4.3.1'
implementation group: 'org.mongodb', name: 'bson', version: '4.3.1'

I have my aggregation pipelines written in JSON files which are placed in the src/main/resources folder. The aggregate function only accepts a List<Bson> . After fetching the file, how do I pass it into the MongoDB method?

String fileName = "physics/test.json";
File file = new File(fileName);
MongoDatabase db = mongoClient.getDatabase(DATABASE_NAME);
MongoCollection collection = db.getCollection(collectionName);
// Convert file to List<Bson> ???
AggregateIterable sourceList = collection.aggregate(pipeline);

A sample of my JSON file pipeline is:
[{"$group":{"_id":{"segment":"$segment","invoice_id":{"$trim":{"input":"$invoice_id"}}},"qty":{"$sum":"$qty"}}}]

Hi @khat33b

Something like this should work:

    String json = """
            [{"$group":{"_id":{"segment":"$segment","invoice_id":{"$trim":{"input":"$invoice_id"}}},"qty":{"$sum":"$qty"}}}]""";

    List<BsonDocument> pipeline = new BsonArrayCodec().decode(new JsonReader(json), DecoderContext.builder().build())
            .stream().map(BsonValue::asDocument)
            .collect(Collectors.toList());

    MongoClient client = new MongoClient();
    
    client.getDatabase("test").getCollection("test")
            .aggregate(pipeline).into(new ArrayList<>());

Regards,
Jeff

@Jeffrey_Yemin Is there a way to get the result of the aggregation as a Json String?

There is no way for the driver to provide the results as a JSON array, but this will get you pretty close:

    import org.bson.json.JsonObject;
    // ...
    
    List<JsonObject> results = client.getDatabase("test").getCollection("test").withDocumentClass(JsonObject.class)
            .aggregate(pipeline).into(new ArrayList<>());
    
    for (JsonObject cur: results) {
        System.out.println(cur.getJson());
    }

@Jeffrey_Yemin This solved this problem although I am getting a List of BsonDocuments rather than JsonObjects. But I am facing another issue that all my aggregations which ran quite fast with NodeJS are running very slowly with Java. They are taking more than two minutes whereas in Node it was a second. Could you give me some ideas as to why this is happening?

It was just Document not BsonDocument.

@Jeffrey_Yemin his solved this problem. But I am facing another issue that all my aggregations which ran quite fast with NodeJS are running very slowly with Java. They are taking more than two minutes whereas in Node it was a second. Could you give me some ideas as to why this is happening?

Make sure you’re tacking this on to your MongoCollection:

It will change the type of the objects representing the documents in the collection.

As for the query performance, it’s unlikely that it’s caused by the driver. I’m guessing there is some subtle difference in the pipeline that is causing it. You can try running explain on the pipeline to see what the server is doing.

do not forget that what you get is a cursor. if you do not iterate over the cursor, no documents are retrieved from the server. the call to into() in your java code does iterate and retrieve documents. if you publish your node code we could try to see what is the difference.

I like that hypothesis, @steevej.

Thanks,
Jeff

@steevej and @Jeffrey_Yemin

I tried running it with various aggregations and I am getting the same delay. Even with a simple projection like this, it is taking around 2 minutes. Could it be related to the fact that I am using JSON and not the helper functions?

[
  {
    "$project":{
      "upc":1,
      "inventory_name":1,
      "street_date":1,
      "format":1,
      "sub_studio":1,
      "us_base":1,
      "us_srp":1,
      "disc_num":1,
      "modified_date":1
    }
  }
]

I tried running this explanation query:

Document explanation = collection.aggregate(pipeline).explain(ExplainVerbosity.EXECUTION_STATS);

List<Document> stages = explanation.get("stages", List.class);
List<String> keys = Arrays.asList("queryPlanner", "winningPlan");

for (Document stage : stages) {
    Document cursorStage = stage.get("$cursor", Document.class);
    if (cursorStage != null) {
        System.out.println(cursorStage.getEmbedded(keys, Document.class).toJson());
    }
}

The explanation for this aggregation is:

{"stage": "PROJECTION_SIMPLE", "transformBy": {"cidm_retailer_type": 1, "date_sk": 1, "extd_price": 1, "format": 1, "inventory_name": 1, "invoice_id": 1, "qty": 1, "retailer_id": 1, "retailer_name": 1, "street_date": 1, "sub_studio": 1, "upc": 1, "_id": 0}, "inputStage": {"stage": "COLLSCAN", "direction": "forward"}}