Any chance to get a new Java driver release with latest bug fixes?

The mongo-java-driver seems to have some issues with how the connection pool is managed. Eventually leading to a timeout due to the pool reaching the max no. of connections. (I believe it has been fixed in Fix deadlock and couple more problems in `DefaultConnectionPool` by stIncMale · Pull Request #699 · mongodb/mongo-java-driver · GitHub)

Hi there.

That PR is a fix to another PR that has not been included yet in a release, so it would be unrelated to any issues you are seeing in a released version of the Java driver.

Can you elaborate on the issues that you are seeing (driver version, stack trace, logs)?

Regards,
Jeff

1 Like

Hello,

Thanks for coming back so quickly.

This is the error message that we got back.

Timeout waiting for a pooled connection after 120000 MILLISECONDS

We enabled the JMXConnectionPoolListener and noticed the pool reaches the maximum size before this happens.

Running version 4.2.3. (Connecting to a replica set hosted in MongoAtlas)

This is not uncommon, and the root cause is not a bug in the driver. Rather, it’s usually one of:

  1. A connection pool that is just too small. The solution is to increase the size of the pool.
  2. Executing operations that are not optimized, e.g. a query on an unindexed field in a collection
  3. Networking issues

Some possibilities to debug:

  1. Increase log levels to DEBUG and analyze the logs to determine which operations are not completing fast enough
  2. Set socketTimeout (global configuration) and maxTime (per-operation config) to prevent runaway operations.
  3. Do a thread dump of the JVM to gather stack traces to see where connections are being pinned on long-running operations. This will only be effective if you’re using the synchronous driver and not the Reactive Streams driver.

Regards,
Jeff

1 Like
  1. A connection pool that is just too small. The solution is to increase the size of the pool.

The default size pool is 100 and we increased to 200 and the same thing happened.

2.Executing operations that are not optimized, e.g. a query on an unindexed field in a collection

I’ll double-check this. I’m pretty sure all queries are a run against a suitable index.

3.Networking issues

I’ve used netcat to try to test the connectivity to all instances in the replica set and all seemed well.

I’ll try switching to DEBUG and see if anything useful shows up. :+1:

I managed to reproduce it locally.
At some point, the pool size reaches a maximum of 100. I stopped the load test on the app and then I ran a request towards the app that involves reading from MongoDB. The response is a timeout.
I get this timeout even though there is no more load in the application.
I did a thread dump and noticed a thread called AsyncGetter being in a TIMED_WAITING state after which I get the timeout.
E.g.

“AsyncGetter-4-thread-1” - Thread t@82
java.lang.Thread.State: TIMED_WAITING
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <4a5f1261> (a java.util.concurrent.Semaphore$FairSync)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:409)
at com.mongodb.internal.connection.ConcurrentPool.acquirePermit(ConcurrentPool.java:197)
at com.mongodb.internal.connection.ConcurrentPool.get(ConcurrentPool.java:140)
at com.mongodb.internal.connection.DefaultConnectionPool.getPooledConnection(DefaultConnectionPool.java:284)
at com.mongodb.internal.connection.DefaultConnectionPool.access$200(DefaultConnectionPool.java:63)
at com.mongodb.internal.connection.DefaultConnectionPool$1.run(DefaultConnectionPool.java:164)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- locked <1c600001> (a java.util.concurrent.ThreadPoolExecutor$Worker)

Any reason why a connection couldn’t be retrieved when the pool has 100 opened connections ?

We are also observing a similar issue with Mongo driver 3.12. Has there been any fix suggested for this issue ?
From the thread dump I see few AsyncGetter threads are stuck in TIMED_WAITING. It is causing the application to get stuck and not recovering by itself until the application is restarted. We are using Java driver 3.12