How to implement the idempotent consumer pattern with MongoDB?

Hi!

I am evaluating MongoDB as the primary operational database for a microservice. This microservice processes events that may be duplicates. An event is considered a duplicate if it is identical to another event that has already been processed by the microservice.

The microservice needs to handle these potentially duplicate events without generating any side effects (considering only database updates as side effects).

Assumptions: All events have a “message-id” field that can be used to determine if two messages are the same.

With a relational database, this problem can be solved quite easily: we can bundle both the updates to the business entities and the insertion of a record containing the “message-id” of the processed event as the primary key in the same transaction. When the microservice processes duplicate events, the transaction aborts due to the unique key violation, allowing the application to catch this case and detect the duplicate.

How would you approach the same problem with MongoDB? While we can use transactions in MongoDB, extensive use of transactions might affect its performance.

Another solution is to enrich all documents with a “message-ids” array and append the “message-id” of the event we are processing when updating that document. The updates can check if the current “message-id” is already present in the “message-ids” array, detecting if the event was already processed. However, I find this approach quite invasive as it changes the structure of business entities.

Am I missing any other possible solutions? How do you handle this kind of problem when using MongoDB as your primary operational database?

To handle the problem of duplicate event processing in MongoDB, the approach needs to ensure that the system can detect duplicate events without introducing unnecessary complexity or side effects. Depends a lot about the use case that I am doing but here is my options

1. Use a Separate Collection as a Event Processing Log with Unique Index

One solution, is to create a separate MongoDB collection that stores processed message-ids. This collection would have a unique index on the message-id field, and the idea is to insert the message-id when processing an event. If the message-id already exists (indicating the event is a duplicate), MongoDB will raise a duplicate key error, you just need to put a TTL at this collection on the same time of TTL to message, to doesn’t growth infinitely.

Pros

  • Simple and effective
  • Efficient duplicate detection
  • Minimal impact on business entities

Cons

  • Requires an extra collection
  • Does not avoid transactions

2. Locking with message-id Embedded in Business Entities

Although you mentioned that embedding message-ids in the business entities might be invasive, another possible solution is to use optimistic locking by embedding a single message-id in the relevant documents to detect duplicates.

Pros

  • Simpler than a transaction
  • Optimistic concurrency

Cons

  • Changes to business entities/or not (Depends about your architecture)
  • Limited history

Rather simplistic but how about just using upserts keyed against the message id?
(You can then also catch the returned value to check if an insert was performed)