Out of memory: Java Heap Space with mongodb-driver-core:4.6.1 with ReactiveMongoTemplate

Hi,

I migrate my application from Jboss to Spring Boot. I use the spring ReactiveMongoTemplate (in both jboss and Springboot) bean that requires a new version of mongodb-driver-core dependency.

FYI, I dont have any problem with the Jboss version but it seems a memory leak with the Spring Boot version

I have tried to play with minPoolSize and maxPoolSize but no matter the configuration, I always have always the same problem with my Spring Boot app.

Here is the difference:

mongodb-driver-core:3.12.2 with jboss (no problem after injecting 50 query/sec during 10 minutes)

mongodb-driver-core:4.6.1 witj spring boot(Java heap space after injecting 50 query/sec during 2 minutes)

I notice that the async code has a lot of change beetween the two versions.

3.12.2: https://github.com/mongodb/mongo-java-driver/blob/r3.12.2/driver-core/src/main/com/mongodb/internal/connection/DefaultConnectionPool.java

4.6.1: https://github.com/mongodb/mongo-java-driver/blob/r4.6.1/driver-core/src/main/com/mongodb/internal/connection/DefaultConnectionPool.java

In my heap dump i see a lot of ``LinkedBlockingQueue` instances.

Leak Suspects

One instance of “com.mongodb.internal.connection.DefaultConnectionPool$AsyncWorkManager” loaded by “jdk.internal.loader.ClassLoaders$AppClassLoader @ 0xe085d958” occupies 295?880?152 (59,69 %) bytes.

Keywords
com.mongodb.internal.connection.DefaultConnectionPool$AsyncWorkManager
jdk.internal.loader.ClassLoaders$AppClassLoader @ 0xe085d958

Is there a bug in the driver or do I need to add configuration to support the same load as on my jboss instance?

Here is the application code sample (same on jboss and springboot version):

import com.myCorp.model.MediaStatusRepository;
import com.myCorp.model.PushStatus;
import org.springframework.data.mongodb.core.ReactiveMongoTemplate;
import org.springframework.data.mongodb.core.query.Criteria;
import org.springframework.data.mongodb.core.query.Query;
import org.springframework.stereotype.Component;
import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;

import javax.inject.Inject;
import javax.inject.Named;

@Component
@Named("myRepository")
public class MyMongoMongoRepository {

    private static final String COLLECTION_NAME = "my_collection";
    private static final int LIMIT = 50;

    @Inject
    private ReactiveMongoTemplate mongoTemplate;


    public Mono<PushStatus> save(PushStatus pushStatus) {
        return mongoTemplate.insert(pushStatus, COLLECTION_NAME)
                .doOnSuccess(p -> System.out.println("saved"))
                .doOnError(t -> System.out.println("error"));

    }

    public Flux<PushStatus> find(String myKey, String myRef) {
        Query query = new Query();
        query.addCriteria(Criteria.where("myKey").is(myKey));
        query.addCriteria(Criteria.where("myRef").is(myRef));
        query.limit(LIMIT);
        return mongoTemplate.find(query, PushStatus.class, COLLECTION_NAME);
    }
}

Thanks

It’s hard to tell. The only possibly relevant change I’m aware of is in the 4.0 upgrade notes:

The connection pool no longer enforces any restrictions on the size of the wait queue of threads or asynchronous tasks that require a connection to MongoDB. It is up to the application to throttle requests sufficiently rather than rely on the driver to throw a MongoWaitQueueFullException.

But if you weren’t getting any MongoWaitQueueFullException exceptions thrown from the 3.12 driver then that’s probably not it.

It might help the diagnosis along if you could reproduce this in a standalone application.

Hi @Jeffrey_Yemin ,

Thanks for your answer.How the driver can throw a MongoWaitQueueFullException if in v4 the the wait queue size is not configurable in the connnection string? How to regulate it from client?

Maybe the problem comme from the asyncWorker (the worker does not exists on v3)

Hi @Jeffrey_Yemin ,

Here are the results of the bench I did locally.

Bench made with gatling which represents 100 writes/sec and 500 reads/sec during five minutes for the two use cases.

  • The first result was done on a jboss instance (512M) using mongo-java-driver:3.12.2
    No problem observed except for a queue size > 500 at the start of the instance for a few seconds

Detail sample of the mongostat command:

insert query update delete getmore command flushes mapped vsize res faults qrw arw net_in net_out conn time
102 510 *0 *0 0 110|0 0 0B 2.20G 664M 0 0|0 0|0 244k 8.67m 229 Jun 22 16:17:49.931
100 504 *0 *0 0 101|0 0 0B 2.20G 664M 0 0|0 0|0 241k 8.58m 229 Jun 22 16:17:50.931
98 490 *0 *0 0 100|0 0 0B 2.20G 664M 0 0|0 0|0 234k 8.33m 229 Jun 22 16:17:51.931
102 511 *0 *0 0 103|0 0 0B 2.20G 664M 0 0|0 0|0 243k 8.67m 229 Jun 22 16:17:52.931
97 489 *0 *0 0 98|0 0 0B 2.20G 664M 0 0|0 0|0 233k 8.28m 229 Jun 22 16:17:53.930
103 510 *0 *0 0 105|0 0 0B 2.20G 664M 0 0|0 0|0 244k 8.72m 229 Jun 22 16:17:54.930
100 503 *0 *0 0 102|0 0 0B 2.20G 664M 0 0|0 0|0 240k 8.56m 229 Jun 22 16:17:55.931
100 501 *0 *0 0 104|0 0 0B 2.20G 664M 0 0|0 0|0 239k 8.52m 229 Jun 22 16:17:56.931
97 489 *0 *0 0 100|0 0 0B 2.20G 664M 0 0|0 0|0 234k 8.33m 229 Jun 22 16:17:57.931
99 500 *0 *0 0 103|0 0 0B 2.20G 664M 0 0|0 0|0 239k 8.50m 229 Jun 22 16:17:58.931

We find that the read/writes are on the whole linear.

  • The second result was done on a Sprinboot instance (512M) using mongo-java-driver:4.6.1

There is a very rapid increase in the number of tasks to be processed (which causes the Java Heap Space after about 61000 waiting queues)

Detail of the mongostat command:

insert query update delete getmore command flushes mapped vsize res faults qrw arw net_in net_out conn time
74 362 *0 *0 168 259|0 0 0B 2.15G 634M 0 0|0 0|0 264k 5.12m 169 Jun 22 16:03:34.017
73 363 *0 *0 243 332|0 0 0B 2.15G 634M 0 0|0 0|0 301k 5.59m 177 Jun 22 16:03:35.013
74 376 *0 *0 296 389|0 0 0B 2.16G 634M 0 0|0 0|0 334k 6.12m 183 Jun 22 16:03:36.014
76 392 *0 *0 405 494|0 0 0B 2.17G 634M 0 0|0 0|0 395k 6.76m 191 Jun 22 16:03:37.016
75 364 *0 *0 264 363|0 0 0B 2.17G 634M 0 0|0 0|0 314k 5.93m 196 Jun 22 16:03:38.014
93 470 *0 *0 331 433|0 0 0B 2.18G 634M 0 0|0 0|0 393k 7.31m 202 Jun 22 16:03:39.018
97 490 *0 *0 356 468|0 0 0B 2.18G 634M 0 0|0 0|0 417k 7.75m 208 Jun 22 16:03:40.014
90 468 *0 *0 361 467|0 0 0B 2.19G 634M 0 0|0 0|0 409k 7.40m 212 Jun 22 16:03:41.015
87 438 *0 *0 345 446|0 0 0B 2.19G 634M 0 0|0 0|0 387k 7.06m 218 Jun 22 16:03:42.015
100 486 *0 *0 412 497|0 0 0B 2.20G 634M 0 0|0 0|0 436k 8.14m 222 Jun 22 16:03:43.015

We find that the read/writes are not linear, hence the reason for the constantly increasing queue.

Can you help me please? If you want i can share you as an attachment the 2 projects with the gatling scenario to reproduce the problem.

Thanks,
Julien

Have you tried simply limiting the number of concurrent (but still asynchronous) operations to 500 (using a Semaphore with 500 permits, for example)?

I wonder if that will keep things steady.

Jeff

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.