BlogAnnounced at MongoDB.local NYC 2024: A recap of all announcements and updatesLearn more >>
MongoDB Developer
Java
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Languageschevron-right
Javachevron-right

How to Implement Client-Side Field Level Encryption (CSFLE) in Java with Spring Data MongoDB

Maxime Beugnet, Megha Arora11 min read • Published Nov 06, 2023 • Updated Jan 27, 2024
SpringJava
Facebook Icontwitter iconlinkedin icon
Rate this code example
star-empty
star-empty
star-empty
star-empty
star-empty

GitHub Repository

The source code of this template is available on GitHub:
To get started, you'll need:
  • Java 17.
  • A MongoDB cluster v7.0.2 or higher.
See the README.md file for more information.

Video

This content is also available in video format.

Introduction

This post will explain the key details of the integration of MongoDB Client-Side Field Level Encryption (CSFLE) with Spring Data MongoDB.
However, this post will not explain the basic mechanics of CSFLE or Spring Data MongoDB.
If you feel like you need a refresher on CSFLE before working on this more complicated piece, I can recommend a few resources for CSFLE:
And for Spring Data MongoDB:
This template is significantly larger than other online CSFLE templates you can find online. It tries to provide reusable code for a real production environment using:
  • Multiple encrypted collections.
  • Automated JSON Schema generation.
  • Server-side JSON Schema.
  • Separated clusters for DEKs and encrypted collections.
  • Automated data encryption keys generation or retrieval.
  • SpEL Evaluation Extension.
  • Auto-implemented repositories.
  • Open API documentation 3.0.1.
While I was coding, I also tried to respect the SOLID Principles as much as possible to increase the code readability, usability, and reutilization.

High-Level Diagrams

Now that we are all on board, here is a high-level diagram of the different moving parts required to create a correctly-configured CSFLE-enabled MongoClient which can encrypt and decrypt fields automatically.
Project high-level diagram
The arrows can mean different things in the diagram:
  • "needs to be done before"
  • "requires"
  • "direct dependency of"
But hopefully it helps explain the dependencies, the orchestration, and the inner machinery of the CSFLE configuration with Spring Data MongoDB.
Once the connection with MongoDB — capable of encrypting and decrypting the fields — is established, with the correct configuration and library, we are just using a classical three-tier architecture to expose a REST API and manage the communication all the way down to the MongoDB database.
Three-tier architecture
Here, nothing tricky or fascinating to discuss, so we are not going to discuss this in this post.
Let's now focus on all the complicated bits of this template.

Creation of the Key Vault Collection

As this is a tutorial, the code can be started from a blank MongoDB cluster.
So the first point of order is to create the key vault collection and its unique index on the keyAltNames field.
In production, you could choose to create the key vault collection and its unique index on the keyAltNames field manually once and remove the code as it's never going to be executed again. I guess it only makes sense to keep it if you are running this code in a CI/CD pipeline.
One important thing to note here is the dependency to a completely standard (i.e., not CSFLE-enabled) and ephemeral MongoClient (use of a try-with-resources block) as we are already creating a collection and an index in our MongoDB cluster.
When it's done, we can close the standard MongoDB connection.

Creation of the Data Encryption Keys

We can now create the Data Encryption Keys (DEKs) using the ClientEncryption connection.
We can instantiate directly a ClientEncryption bean using the KMS and use it to generate our DEKs (one for each encrypted collection).
One thing to note here is that we are storing the DEKs in a map, so we don't have to retrieve them again later when we need them for the JSON Schemas.

Entities

One of the key functional areas of Spring Data MongoDB is the POJO-centric model it relies on to implement the repositories and map the documents to the MongoDB collections.
PersonEntity.java
As you can see above, this entity contains all the information we need to fully automate CSFLE. We have the information we need to generate the JSON Schema:
  • Using the SpEL expression #{mongocrypt.keyId(#target)}, we can populate dynamically the DEK that was generated or retrieved earlier.
  • ssn is a String that requires a deterministic algorithm.
  • bloodType is a String that requires a random algorithm.
The generated JSON Schema looks like this:

SpEL Evaluation Extension

The evaluation of the SpEL expression is only possible because of this class we added in the configuration:
Note that it's the place where we are retrieving the DEKs and matching them with the target: "PersonEntity", in this case.

JSON Schemas and the MongoClient Connection

JSON Schemas are actually not trivial to generate in a Spring Data MongoDB project.
As a matter of fact, to generate the JSON Schemas, we need the MappingContext (the entities, etc.) which is created by the automatic configuration of Spring Data which creates the MongoClient connection and the MongoTemplate...
But to create the MongoClient — with the automatic encryption enabled — you need JSON Schemas!
It took me a significant amount of time to find a solution to this deadlock, and you can just enjoy the solution now!
The solution is to inject the JSON Schema creation in the autoconfiguration process by instantiating the MongoClientSettingsBuilderCustomizer bean.
One thing to note here is the option to separate the DEKs from the encrypted collections in two completely separated MongoDB clusters. This isn't mandatory, but it can be a handy trick if you choose to have a different backup retention policy for your two clusters. This can be interesting for the GDPR Article 17 "Right to erasure," for instance, as you can then guarantee that a DEK can completely disappear from your systems (backup included). I talk more about this approach in my Java CSFLE post.
Here is the JSON Schema service which stores the generated JSON Schemas in a map:
We are storing the JSON Schemas because this template also implements one of the good practices of CSFLE: server-side JSON Schemas.

Create or Update the Encrypted Collections

Indeed, to make the automatic encryption and decryption of CSFLE work, you do not require the server-side JSON Schemas.
Only the client-side ones are required for the Automatic Encryption Shared Library. But then nothing would prevent another misconfigured client or an admin connected directly to the cluster to insert or update some documents without encrypting the fields.
To enforce this you can use the server-side JSON Schema as you would to enforce a field type in a document, for instance.
But given that the JSON Schema will evolve with the different versions of your application, the JSON Schemas need to be updated accordingly each time you restart your application.

Multi-Entities Support

One big feature of this template as well is the support of multiple entities. As you probably noticed already, there is a CompanyEntity and all its related components but the code is generic enough to handle any amount of entities which isn't usually the case in all the other online tutorials.
In this template, if you want to support a third type of entity, you just have to create the components of the three-tier architecture as usual and add your entry in the EncryptedCollectionsConfiguration class.
Everything else from the DEK generation to the encrypted collection creation with the server-side JSON Schema is fully automated and taken care of transparently. All you have to do is specify the @Encrypted(algorithm = "AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic") annotation in the entity class and the field will be encrypted and decrypted automatically for you when you are using the auto-implemented repositories (courtesy of Spring Data MongoDB, of course!).

Query by an Encrypted Field

Maybe you noticed but this template implements the findFirstBySsn(ssn) method which means that it's possible to retrieve a person document by its SSN number, even if this field is encrypted.
Note that it only works because we are using a deterministic encryption algorithm.

Wrapping Up

Thanks for reading my post!
If you have any questions about it, please feel free to open a question in the GitHub repository or ask a question in the MongoDB Community Forum.
Feel free to ping me directly in your post: @MaBeuLux88.
Pull requests and improvement ideas are very welcome!

Facebook Icontwitter iconlinkedin icon
Rate this code example
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Tutorial

Integrating MongoDB with Amazon Managed Streaming for Apache Kafka (MSK)


Jun 12, 2023 | 7 min read
Quickstart

Java - Aggregation Pipeline


Mar 01, 2024 | 8 min read
Article

How to Build a Search Service in Java


Apr 23, 2024 | 11 min read
Article

Optimizing Java Performance With Virtual Threads, Reactive Programming, and MongoDB


Mar 20, 2024 | 5 min read
Technologies Used
Languages
Technologies
Table of Contents