Java - Client Side Field Level Encryption
Rate this quickstart
- Update Java Driver to 4.2.2.
- Added Client Side Field Level Encryption example.
- Update Java Driver to 4.1.1.
It's the ultimate piece of security against any kind of intrusion or snooping around your MongoDB cluster. Only the application with the correct encryption keys can decrypt and read the protected data.
Let's check out the Java CSFLE API with a simple example.
I will use the same repository as usual in this series. If you don't have a copy of it yet, you can clone it or just update it if you already have it:
Do not confuse
libmongocryptlibrary which is the companion C library used by the drivers to encrypt and decrypt your data. We need this library to run CSFLE. I added it in the
pom.xmlfile of this project.
In this quickstart tutorial, I will show you the CSFLE API using the MongoDB Java Driver. I will show you how to:
- create and configure the MongoDB connections we need.
- create a master key.
- create Data Encryption Keys (DEK).
- create and read encrypted documents.
But for short, the following command should get you up and running in no time:
This is the output you should get:
Let's have a look in depth to understand what is happening.
CSFLE looks complicated, like any security and encryption feature, I guess. Let's try to make it simple in a few words.
- You can use one DEK for our entire cluster or a different DEK for each field of each document in your cluster. It's up to you.
- The DEKs are stored in a collection in a MongoDB cluster which does not have to be the same that contains the encrypted data. The DEKs are stored encrypted. They are useless without the master key which needs to be protected.
- You can use the manual (community edition) or the automated (enterprise advanced or Atlas) encryption of fields.
- The decryption can be manual or automated. Both are part of the community edition of MongoDB. In this blog post, I will use manual encryption and automated decryption to stick with the community edition of MongoDB.
European laws enforce data protection and privacy. Any oversight can result in massive fines.
For example, CSFLE could be a great way to enforce the of GDPR. If a user asks to be removed from your systems, the data must be erased from your production cluster, of course, but also the logs, the dev environment, and the backups... And let's face it: Nobody will ever remove this user's data from the backups. And if you ever restore or use these backups, this can cost you millions of dollars/euros.
But now... encrypt each user's data with a unique Data Encryption Key (DEK) and to "forget" a user forever, all you have to do is lose the key. So, saving the DEKs on a separated cluster and enforcing a low retention policy on this cluster will ensure that a user is truly forgotten forever once the key is deleted.
If the primary motivation is just to provably ensure that deleted plaintext user records remain deleted no matter what, then it becomes a simple timing and separation of concerns strategy, and the most straight-forward solution is to move the keyvault collection to a different database or cluster completely, configured with a much shorter backup retention; FLE does not assume your encrypted keyvault collection is co-resident with your active cluster or has the same access controls and backup history, just that the client can, when needed, make an authenticated connection to that keyvault database. Important to note though that with a shorter backup cycle, in the event of some catastrophic data corruption (malicious, intentional, or accidental), all keys for that db (and therefore all encrypted data) are only as recoverable to the point in time as the shorter keyvault backup would restore.
More trivial, but in the event of an intrusion, any stolen data will be completely worthless without the master key and would not result in a ruinous fine.
It's as simple as that to generate a new one:
But you most probably just want to do this once and then reuse the same one each time you restart your application.
Here is my implementation to store it in a local file the first time and then reuse it for each restart.
In this simple quickstart, I will only use a single master key, but it's totally possible to use multiple master keys.
Here is the configuration for a local KMS:
We will need to set up two different clients:
- The first one ─
ClientEncryption─ will be used to create our Data Encryption Keys (DEK) and encrypt our fields manually.
- The second one ─
MongoClient─ will be the more conventional MongoDB connection that we will use to read and write our documents, with the difference that it will be configured to automatically decrypt the encrypted fields.
You don't have to reuse the same connection string for both connections. It would actually be a lot more "GDPR-friendly" to use separated clusters so you can enforce a low retention policy on the Data Encryption Keys.
The first thing you should do before you create your first Data Encryption Key is to create a unique index on the key alternate names to make sure that you can't reuse the same alternate name on two different DEKs.
These names will help you "label" your keys to know what each one is used for ─ which is still totally up to you.
In my example, I choose to use one DEK per user. I will encrypt all the fields I want to secure in each user document with the same key. If I want to "forget" a user, I just need to drop that key. In my example, the names are unique so I'm using this for my
keyAltNames. It's a great way to enforce GDPR compliance.
Let's create two Data Encryption Keys: one for Bobby and one for Alice. Each will be used to encrypt all the fields I want to keep safe in my respective user documents.
We get a little help from this private method to make my code easier to read:
Here is what Bobby's DEK looks like in my
As you can see above, the
keyMaterial(the DEK itself) is encrypted by the master key. Without the master key to decrypt it, it's useless. Also, you can identify that it's Bobby's key in the
Now that we have an encryption key for Bobby and Alice, I can create their respective documents and insert them into MongoDB like so:
Here is what Bobby and Alice documents look like in my
In my example, if I want to be able to retrieve users by phone numbers, I must use the deterministic algorithm. As a phone number is likely to be unique in my collection of users, it's safe to use this algorithm here.
In my example, the blood type has a low cardinality and it doesn't make sense to search in my user collection by blood type anyway, so it's safe to use this algorithm for this field.
Also, Bobby's medical record must be very safe. So, the entire subdocument containing all his medical records is encrypted with the random algorithm as well and won't be used to search Bobby in my collection anyway.
As mentioned in the previous section, it's possible to search documents by fields encrypted with the deterministic algorithm.
Here is how:
I simply encrypt again, with the same key, the phone number I'm looking for, and I can use this
BsonBinaryin my query to find Bobby.
If I output the
docstring, I get:
As you can see, the automatic decryption worked as expected, I can see my document in clear text. To find this document, I could use the
age, or the phone number, but not the
Now let's put CSFLE to the test. I want to be sure that if Alice's DEK is destroyed, Alice's document is lost forever and can never be restored, even from a backup that could be restored. That's why it's important to keep the DEKs and the encrypted documents in two different clusters that don't have the same backup retention policy.
Let's retrieve Alice's document by name, but let's protect my code in case something "bad" has happened to her key...
If her key still exists in the database, then I can decrypt her document:
Now, let's remove her key from the database:
In a real-life production environment, it wouldn't make sense to read her document again; and because we are all professional and organised developers who like to keep things tidy, we would also delete Alice's document along with her DEK, as this document is now completely worthless for us anyway.
This cache is very important because, without it, multiple back-and-forth would be necessary to decrypt my document. It's critical to prevent CSFLE from killing the performances of your MongoDB cluster.
So, to make sure I'm not using this cache anymore, I'm creating a brand new
MongoClient(still with auto decryption settings) for the sake of this example. But of course, in production, it wouldn't make sense to do so.
Now if I try to access Alice's document again, I get the following
MongoException, as expected:
CSFLE is the ultimate security feature to ensure the maximal level of security for your cluster. Not even your admins will be able to access the data in production if they don't have access to the master keys.
There is a lot of flexibility in the implementation of CSFLE: You can choose to use one or multiple master keys, same for the Data Encryption Keys. You can also choose to encrypt all your phone numbers in your collection with the same DEK or use a different one for each user. It's really up to you how you will organise your encryption strategy but, of course, make sure it fulfills all your legal obligations. There are multiple right ways to implement CSFLE, so make sure to find the most suitable one for your use case.