DocumentDB cluster: invalidating the connection pool kills connections of other microservices

Hi there,

the combination of AWS DocumentDB, Cloud Foundry, Spring Boot, Spring Data, Mongo Java Driver and blue/green deployments caused my team a lot of trouble in the last days. We are kind of stuck and looking for insights that we might be missing.

We are using AWS DocumentDB which is a MongoDB compatible clustered database by AWS. We have a Spring Boot microservice which connects to the DocumentDB cluster via the cluster endpoint.

Short story:
When our Spring Boot microservice shuts down, the MongoDB driver gets an interrupt which invalidates the connection pool. Somehow, this seems to close all connections of other microservices as well. The other microservice takes around one minute to be able to open a new connection to the DocumentDB cluster.

Long story:

  1. We deploy a new Spring Boot microservice to Cloud Foundry via blue/green deployment. The previous microservice keeps running until the new microservices starts up and gets healthy.

  2. After the new application gets healthy, CF Diego stops the venerable (previous) microservice.

  3. The stopped microservice gets a (SIGTERM) signal. Spring Boot tries a graceful shutdown and sends an interrupt signal to all running threads which sometimes causes a MongoDB driver interrupt.

  4. After the interrupt, the MongoDB driver invalidates the connection pool. So far so good… Just a couple of milliseconds after the connection pool invalidation, the new microservices, which was healthy before, also loses the connection to the DocumentDB Cluster. It restarts. Even after restarting, it gets connection refused exceptions. Around 1 minute after the connection pool invalidation, the new microservice can open the connection again.

I cannot reproduce the error with MongoDB and multiple applications locally. I guess, the error only occurs either with clusters or with DocumentDB.

Do you have any ideas about possible causes? Why would shutting down a Spring Boot application and the MongoDB driver interrupts cause global connection errors which sustain for 1 minute?

The problem exists with Spring Boot versions 2.1.7.RELEASE and 2.3.0.RELEASE. The latter has the MongoDB Java Driver 4.0.x

Hi @Artun_Subasi,

Amazon DocumentDB is a separate implementation from the MongoDB server.

DocumentDB uses the MongoDB 3.6 wire protocol, but there are number of functional differences and the supported commands are a subset of those available in MongoDB 3.6.

If your issue isn’t reproducible with an actual MongoDB deployment, there isn’t much we can do to investigate. You could try standing up a MongoDB Atlas cluster to compare behaviour. The free tier in Atlas provides a basic replica set with 512MB of storage.

For support using DocumentDB I suggest posting on the AWS Forums or Stack Overflow.

Regards,
Stennie

Thanks Stennie,

i was just looking for insights and ideas in case that I’m missing something. I’ll investigate further using other support channels and give a feedback here if I find something new.

1 Like

I created a minimal project with an attempt to reproduce the problem with either DocumentDB or MongoDB Atlas: GitHub - ArtunSubasi/documentdb-cluster-tester: A sample app which attempts to reproduce a connection problem

It seems to work fine with DocumentDB. The problem may exist within our Cloud Foundry infrastructure.

1 Like