Out of memory: Java Heap Space with mongodb-driver-core:4.6.1 with ReactiveMongoTemplate

drexlbob_Julien42 · June 16, 2023, 12:59pm

Hi,

I migrate my application from Jboss to Spring Boot. I use the spring ReactiveMongoTemplate (in both jboss and Springboot) bean that requires a new version of mongodb-driver-core dependency.

FYI, I dont have any problem with the Jboss version but it seems a memory leak with the Spring Boot version

I have tried to play with minPoolSize and maxPoolSize but no matter the configuration, I always have always the same problem with my Spring Boot app.

Here is the difference:

mongodb-driver-core:3.12.2 with jboss (no problem after injecting 50 query/sec during 10 minutes)

mongodb-driver-core:4.6.1 witj spring boot(Java heap space after injecting 50 query/sec during 2 minutes)

I notice that the async code has a lot of change beetween the two versions.

3.12.2: https://github.com/mongodb/mongo-java-driver/blob/r3.12.2/driver-core/src/main/com/mongodb/internal/connection/DefaultConnectionPool.java

4.6.1: https://github.com/mongodb/mongo-java-driver/blob/r4.6.1/driver-core/src/main/com/mongodb/internal/connection/DefaultConnectionPool.java

In my heap dump i see a lot of ``LinkedBlockingQueue` instances.

Leak Suspects

One instance of “com.mongodb.internal.connection.DefaultConnectionPool$AsyncWorkManager” loaded by “jdk.internal.loader.ClassLoaders$AppClassLoader @ 0xe085d958” occupies 295?880?152 (59,69 %) bytes.

Keywords
com.mongodb.internal.connection.DefaultConnectionPool$AsyncWorkManager
jdk.internal.loader.ClassLoaders$AppClassLoader @ 0xe085d958

Is there a bug in the driver or do I need to add configuration to support the same load as on my jboss instance?

Here is the application code sample (same on jboss and springboot version):

import com.myCorp.model.MediaStatusRepository;
import com.myCorp.model.PushStatus;
import org.springframework.data.mongodb.core.ReactiveMongoTemplate;
import org.springframework.data.mongodb.core.query.Criteria;
import org.springframework.data.mongodb.core.query.Query;
import org.springframework.stereotype.Component;
import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;

import javax.inject.Inject;
import javax.inject.Named;

@Component
@Named("myRepository")
public class MyMongoMongoRepository {

    private static final String COLLECTION_NAME = "my_collection";
    private static final int LIMIT = 50;

    @Inject
    private ReactiveMongoTemplate mongoTemplate;


    public Mono<PushStatus> save(PushStatus pushStatus) {
        return mongoTemplate.insert(pushStatus, COLLECTION_NAME)
                .doOnSuccess(p -> System.out.println("saved"))
                .doOnError(t -> System.out.println("error"));

    }

    public Flux<PushStatus> find(String myKey, String myRef) {
        Query query = new Query();
        query.addCriteria(Criteria.where("myKey").is(myKey));
        query.addCriteria(Criteria.where("myRef").is(myRef));
        query.limit(LIMIT);
        return mongoTemplate.find(query, PushStatus.class, COLLECTION_NAME);
    }
}

Thanks

Jeffrey_Yemin · June 17, 2023, 1:12am

It’s hard to tell. The only possibly relevant change I’m aware of is in the 4.0 upgrade notes:

The connection pool no longer enforces any restrictions on the size of the wait queue of threads or asynchronous tasks that require a connection to MongoDB. It is up to the application to throttle requests sufficiently rather than rely on the driver to throw a MongoWaitQueueFullException.

But if you weren’t getting any MongoWaitQueueFullException exceptions thrown from the 3.12 driver then that’s probably not it.

It might help the diagnosis along if you could reproduce this in a standalone application.

drexlbob_Julien42 · June 20, 2023, 4:08pm

Hi @Jeffrey_Yemin ,

Thanks for your answer.How the driver can throw a MongoWaitQueueFullException if in v4 the the wait queue size is not configurable in the connnection string? How to regulate it from client?

drexlbob_Julien42 · June 20, 2023, 4:10pm

Maybe the problem comme from the asyncWorker (the worker does not exists on v3)

drexlbob_Julien42 · June 22, 2023, 3:12pm

Hi @Jeffrey_Yemin ,

Here are the results of the bench I did locally.

Bench made with gatling which represents 100 writes/sec and 500 reads/sec during five minutes for the two use cases.

The first result was done on a jboss instance (512M) using mongo-java-driver:3.12.2
No problem observed except for a queue size > 500 at the start of the instance for a few seconds

Detail sample of the mongostat command:

insert query update delete getmore command flushes mapped vsize res faults qrw arw net_in net_out conn time
102 510 *0 *0 0 110|0 0 0B 2.20G 664M 0 0|0 0|0 244k 8.67m 229 Jun 22 16:17:49.931
100 504 *0 *0 0 101|0 0 0B 2.20G 664M 0 0|0 0|0 241k 8.58m 229 Jun 22 16:17:50.931
98 490 *0 *0 0 100|0 0 0B 2.20G 664M 0 0|0 0|0 234k 8.33m 229 Jun 22 16:17:51.931
102 511 *0 *0 0 103|0 0 0B 2.20G 664M 0 0|0 0|0 243k 8.67m 229 Jun 22 16:17:52.931
97 489 *0 *0 0 98|0 0 0B 2.20G 664M 0 0|0 0|0 233k 8.28m 229 Jun 22 16:17:53.930
103 510 *0 *0 0 105|0 0 0B 2.20G 664M 0 0|0 0|0 244k 8.72m 229 Jun 22 16:17:54.930
100 503 *0 *0 0 102|0 0 0B 2.20G 664M 0 0|0 0|0 240k 8.56m 229 Jun 22 16:17:55.931
100 501 *0 *0 0 104|0 0 0B 2.20G 664M 0 0|0 0|0 239k 8.52m 229 Jun 22 16:17:56.931
97 489 *0 *0 0 100|0 0 0B 2.20G 664M 0 0|0 0|0 234k 8.33m 229 Jun 22 16:17:57.931
99 500 *0 *0 0 103|0 0 0B 2.20G 664M 0 0|0 0|0 239k 8.50m 229 Jun 22 16:17:58.931

We find that the read/writes are on the whole linear.

The second result was done on a Sprinboot instance (512M) using mongo-java-driver:4.6.1

There is a very rapid increase in the number of tasks to be processed (which causes the Java Heap Space after about 61000 waiting queues)

Detail of the mongostat command:

insert query update delete getmore command flushes mapped vsize res faults qrw arw net_in net_out conn time
74 362 *0 *0 168 259|0 0 0B 2.15G 634M 0 0|0 0|0 264k 5.12m 169 Jun 22 16:03:34.017
73 363 *0 *0 243 332|0 0 0B 2.15G 634M 0 0|0 0|0 301k 5.59m 177 Jun 22 16:03:35.013
74 376 *0 *0 296 389|0 0 0B 2.16G 634M 0 0|0 0|0 334k 6.12m 183 Jun 22 16:03:36.014
76 392 *0 *0 405 494|0 0 0B 2.17G 634M 0 0|0 0|0 395k 6.76m 191 Jun 22 16:03:37.016
75 364 *0 *0 264 363|0 0 0B 2.17G 634M 0 0|0 0|0 314k 5.93m 196 Jun 22 16:03:38.014
93 470 *0 *0 331 433|0 0 0B 2.18G 634M 0 0|0 0|0 393k 7.31m 202 Jun 22 16:03:39.018
97 490 *0 *0 356 468|0 0 0B 2.18G 634M 0 0|0 0|0 417k 7.75m 208 Jun 22 16:03:40.014
90 468 *0 *0 361 467|0 0 0B 2.19G 634M 0 0|0 0|0 409k 7.40m 212 Jun 22 16:03:41.015
87 438 *0 *0 345 446|0 0 0B 2.19G 634M 0 0|0 0|0 387k 7.06m 218 Jun 22 16:03:42.015
100 486 *0 *0 412 497|0 0 0B 2.20G 634M 0 0|0 0|0 436k 8.14m 222 Jun 22 16:03:43.015

We find that the read/writes are not linear, hence the reason for the constantly increasing queue.

Can you help me please? If you want i can share you as an attachment the 2 projects with the gatling scenario to reproduce the problem.

Thanks,
Julien

Jeffrey_Yemin · June 22, 2023, 6:32pm

Have you tried simply limiting the number of concurrent (but still asynchronous) operations to 500 (using a Semaphore with 500 permits, for example)?

I wonder if that will keep things steady.

Jeff

system · June 27, 2023, 6:32pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.