Java Meets MongoDB: A Deep Dive into Client-Side Field Level Encryption
Rate this video
00:00:00Subtitle
The video begins with an introduction to the speaker and the topic of discussion. The speaker explains what Client-Side Field Level Encryption (CSF) is and its importance in protecting MongoDB databases from various types of attacks.00:06:42Subtitle
The speaker discusses the different types of encryption keys used in CSF: the data encryption keys and the customer master keys. He also explains the role of the Key Management System (KMS) and the key vault collections in storing and managing these keys.00:13:23Subtitle
The speaker demonstrates how to set up CSF, including creating a unique index for the key vault, generating the master key, and creating the MongoDB clients for encryption and decryption.00:20:04Subtitle
The speaker shows how to create and store data encryption keys for each user in the MongoDB database. He also explains how to insert encrypted documents into the database.00:26:46Subtitle
The speaker demonstrates how to retrieve encrypted documents from the database. He explains the difference between deterministic and random encryption algorithms and when to use each.00:33:28Subtitle
The speaker concludes the tutorial by discussing the future of CSF, mentioning the new queriable encryption feature in MongoDB 7.0. He also provides resources for further learning and experimentation with CSF.The primary focus of the video is to provide a tutorial on how to implement Client-Side Field Level Encryption in MongoDB using Java.
🔑 Key Points
- CSF is a security feature that protects MongoDB databases from various types of attacks, including database privileges abuse, network snooping, data theft, and direct access to the RAM.
- CSF involves creating and storing encryption keys, encrypting fields in documents, and decrypting them when retrieving the documents.
- Two types of encryption algorithms can be used: deterministic and random. Deterministic encryption is suitable for fields with unique values, while random encryption is used for fields with low cardinality to prevent frequency attacks.
- It's recommended to use a Key Management System (KMS) to store the master key securely.
- MongoDB provides two ways to encrypt and decrypt documents: automatic encryption and explicit encryption. The latter is compatible with MongoDB Community Edition.
🔗 Related Links
Full Video Transcript
hi everybody I'm Maxim B I'm a senior developer Advocate at mongodb I'm based in France and today I want to talk to you about mongodb Java and clienti field level encryption so in this uh video I will talk to you about what it is exactly what what is CSF all the details and all the pieces you need to make it work and I will show you an example using Java a very basic tutorial and very basic piece of code so you can get started uh quickly if you like so first of all what is CSF CSF is one of many mongodb security features you can use to protect your mongodb database um so uh as you can see a database or any any kind of database can be uh attacked different in different ways you can have database privileges abuse Network snooping data theft or direct access to the RAM for example in the database host with ram scrapping or things like that so uh as you can see CSF the only type of security feature that protects you from all those different type of attacks uh and of course it doesn't mean that this is the only one you should use you should use all of them actually you should Implement uh TLS SSL Network transport encryption you should uh you should use encryption rest and of course R Bas access system which is user password Etc so you need all those features to have a complete set of protection across your Mong clusters so you are protected from any type of attack basically uh so let's see exactly what CSF is and what we need to make it work so first of all let's see a quick example so you can really represent in your head what it is so in this example it's just a way to access an encrypted document with a find which is stored in mongodb encrypted so an encrypted document looks like this so the document in the end is stored like this encrypted inside mongodb so as you can see the SSN the email on the mobile they are binary entries and you can't read them you basically if you are a pirate and you steal that you cannot do anything with this even if you steal the data encryption Keys you cannot use the data encryption keys to to to retrieve the real data that is hidden by uh the encryption algorithm unless you have the customer master key the master key encrypts the data encryption keys they are all hidden and managed in the key uh management system in the CMS uh so if you don't have the master key you basically cannot read anything that's the key of this whole system so in here if we we go through the uh the algorithm of how a document is retrieved you can see that the user sends a find command with the the SSN it's looking for and then this field is encrypted using the master key and the data encryption key we send this data encrypted so when it leaves the mongodb driver uh it's already encrypted so on the network in the ram in everywhere on mongodb side it's already encrypted we look for this encrypted value in the database we find this document and we we return this document to the client once it has been decrypted by the mongodb driver to be precise it's actually decrypted by lib Crypt which is a companion library that you find uh near the driver and in inside the driver directly um cool so let's see so I talked about Keys a bit right so let's see exactly what I mean by that so those keys you have so I mentioned two types right you have uh data encryption Keys also known as deck okay they encrypt the fields directly in your documents you have the customer M master keys uh you can have severals but usually you have one the master key encrypts the dat data encryption keys this is like the master password if you like without this you cannot do anything so that's also why you don't you it's not recommended to store your customer master key or Keys uh near your server near your driver near your application backend your what we call the client in this solution um so that's why we recommend to use Key Management Systems also known as TMS uh you can host that in Amazon in uh gcp Azure in different places you can also manage your own if you like uh but usually we recommend to use those for extra security measures and finally we have the key volt collections which is where we store the data encryption Keys the data encryption keys are stored in the key volt collection uh encrypted so that's why I was saying that the data encryption key is not enough to be able to read the documents you need the associated customer master key to be able to decrypt the data encryption keys so you can finally uncry decrypt the documents using those data encryption keys so let's see exactly so I have an example right in Java I want to show you some Snippets you know of this before I actually execute the code and we diving the code so let's see exactly what those parts you know are in in my database I have a local database running right now so let's see what it looks like my documents uh in my example I have two two persons Bobby and Alice and they both have one document and medical records that I want to keep safe um so they both have one data encryption keys that is associated to them right for each document so there is one data encryption key for Bobby and one for Alice and each key encrypt their own document that's how I designed the system you can totally choose a different way to to organize your master keys data encryption keys and finally documents and Fields it's totally up to you but that's how I chose to do that here to be gdpr friendly um because that's my example if you read the block post that is associated to this video so um here you see the encrypted documents that I have in my collection and here you can see so just for fun I I inserted this document also like the Bobby document in clear just so you see the difference right that would be a document completely clear in my database so here for example I already abused my right as an admin to to read some private information about that person right I know now his blood type which I'm not supposed to know and I know also that this person for example has a bad heart which I'm not supposed to know so so that's an example of you know how privileges can be abused you know to extract extract information and these kind of things um so ideally it should be like this in dat in the database and to do that we need data encryption Keys which are are stored I told you in the vault and the volt is for me right here it doesn't have to be in the same MB cluster it can be stored elsewhere but I chose you know for Simplicity I chose to store them next to each other in the same um mongodb deployment and here you can see uh data encryption key as you can see here is a key material it's a binary uh 64 format and uh it's completely I can't read it like I can't use it like this I the only way to use this is to use the master key to encrypt this key and then I can use it also you note that here I have a key out names and it's Bobby right Ali key it's not here because in the example I remove this key actually to prove that once the key is removed I actually cannot decrypt the document anymore which proves that the document is secured and completely unusable once I have forgotten a key which is gdpr friendly which is a right to be forgotten in gdpr and uh here you see so this key is associated to Bobby right so I know that this key here is related to Bobby so anything I want to read or write for Bobby I basically need to use this key all right uh so let's carry on our little story uh we have so now data encryption keys and customer master master keys which encrypt the data encryption Keys we have a key Vol and a key management system uh let's talk a bit more about key management system for a second so I told you we have several supported systems uh so here you can see Amazon uh Azure Google Club platform uh you can use also um the CIP and the local key provider which is what I'm going to use in this example for the sake of Simplicity as well um so the KMS is responsible for two things create and store your customer master key and keep them safe and create and encrypt your data encryption keys so we will see that as well in the code and how I do that okay um one more thing before we jump in the code uh mongodb provides two ways to encrypt and decrypt the documents one is called the automatic encryption and this is part of MB Atlas and mongodb Enterprise offer in this example I'm not using this I'm just using the uh explicit encryption that you can see mentioned here so I'm not using um this part here which is which was called Crypt D and now it has been replaced actually it's being replaced actively by this which is called the automatic encryption shared library right um so this is how you write a document with the automatic encryption so basically in my example in the codee you're going to see in a minute I'm not using the automatic part here because it relies on a Jon schema so basically in this Jon schema which is on the client side you provide a Jon schema and you say okay this field is going to be encrypted with this key this field with this one and with this algor with this algorithm etc etc once you have this this automatic encryption shared library is capable of encrypting automatically the fields for you in the document and as far Versa when you retrieve those documents it's capable of decrypting them uh the decryption part is actually fully automated already so you don't need to have this in place uh to to decrypt the documents it's already automated and I'm using actually this automatic decryption uh in the example I provide so only this part here uh is part of mongod Atlas and Mong Enterprise Advanced uh for um my example I'm actually not using that that so I'm compatible with mongodb Community Edition right um okay uh just for the record all the code you're going to see here uh today in my example is available in this repository on guub on the mongodb uh-h developer organization and it's part of the Java quick start repository the piece of code I'm going to show you today it's located in the source main uh folder quick start and the there is a folder dedicated to CSF so uh let's Jump Right In the code and you can see we are going to visit those U three classes um all right so let's see uh my example so first of all console decoration it's just for the decoration so you won't see a lot of that and I have two classes one is really my main uh example uh so I set up a few things and you have a demo um which is really my whole main thing right that's my main I have a demo and I run things one by one so we are going to go through uh all that in a minute and um what else and I have a connection helper uh so these are just helping helper functions that I use in my demo to simplify the code from My Demo so I can factorize reuse etc etc so of course we're going to see a bit of that as well uh so let's dive in first of all let's execute the code to see what's what's happening and and what is actually uh you know what the code is doing so let's roll back up and see step by step what's happening so the first thing you see that I need is the master key so uh it tells you that an existing master key was found in the master key uh. txt file so here in my project I actually have a master key that is created if I open the file you can see it it's a binary file I cannot read it like this right um this file if it does not exist in your project yet if you check out the repository and execute the code it will create a brand new master key file for you with a brand new master key secret master key of course uh which you are supposed to secure a lot and uh it's not very secured if it's just right here laying you know in my uh um you know in my in my server you know next to the next to the code it's like leaving the keys you know on the car so you don't really want to do that uh if you get you know hacked if they can steal the key along with the code and with everything to access the database uh basically you you gave everything so so that's why we say uh it's better to use a key KMS a key management system uh okay so I re you a key and you see that I have now a master key which I can print because it's a bunch of bytes so I can uh I can just print my master key then initialization so I need a few things before I do anything first I need to create my key management system so my local key management system for the sake of Simplicity I need to create two mongodb clients one is called the encryption client that the one I use to uh create uh my um data encryption keys and store them in the vault and then I need a MB client which is CSF aware and is capable of doing automatic decryption not the automatic encryption right remember because I don't want to use mongodb Atlas or entreprise in this example I'm just using the explicit encryption which means it relies on me to encrypt the keys in the document right if I don't do that they can be written in clear in my database finally I'm just cleaning the server because it's a tutorial so I'm cleaning everything I'm removing the Vault I'm removing the the user collection I'm removing everything so it's clean for the next execution okay the first thing I need here is to uh create a key out name with unique index what does that mean if I come back to Mong Atlas for Mong Compass sorry for a second uh you can see in My Vault so I told you about this key Al name right so it's a way for me to identify uniquely my key right I don't want two keys with Bobby else I would I would not know which one to use uh to encrypt and decrypt my fields for my Bobby person um so to make sure first of all that I can access and retrieve this key easily From the Vault I need an index to make it fast but I also need to make it unique so I make sure that there is no other key with Bobby in them you know in in this array of key Al names so for this in the index I have indeed uh so it's created in the code directly right uh I have an index which is unique and partial uh so as you can see I have also a filter to make sure that if there is no key Al name provided then I don't Index this document because I cannot retrieve uh with this key right so it's just uh it's not supposed to be you know useful but just in case it's here okay so once I have this uh key alt name with unique index I make sure that I can use my data encryption Keys correctly and I I don't even if I do a mistake and and try to write and create another key for Bobby it will be refused by the server by the Mong cluster I mean so um then we can create two data encryption Keys one for Bobby and one for Alice finally we can insert two documents one for Bobby and one for Alice in my example so they are stored in the encrypted database users and you see those two documents in the in the collection and finally I can retrieve Bobby's document you can see I can print it uh in the in the console here in clear right so it means that I have done correctly the decryption part because it's def definitely encrypted in the database and not in my console same for Alice and in the end just to prove that it's working correctly I remove Alice key uh from the data from the vault which is what you see here Alice is gone her key is not here anymore and uh once the key is removed uh I recreate uh the clients because there is a cache so I don't there is a the data encryption key are actually cached so I don't want to use this cache of course here I want to completely forget about that key so I recreate just for security the um client automat with automatic decryption and finally I try to read again the document and as you can see I have uh an error here I have a try catch and I catch the uh the exception and uh I say that there was an exception in the lib Crypt and I was not able to decrypt those fields anymore so that's the whole example so so let's see now in the code exactly how it works so let's jump back to the first uh um Big demo so I create a few things here at the top just so you iware so I create Bobby and Alice just two strings I create a few name spaces so the encrypted name spaces which is my encrypted DB users I have the name for the master key file I have my local for my local uh key management system and also I have deterministic and random which are the two algorithms or I can use uh for CSF uh I'm going to cover that in a second when we uh when we reach the document and the encryption part but just so you know I have two algorithms right deterministic and random also you can see here the size of the master key which is 96 Okay cool so um let's go in order and let's do now we we read the the console we read the output so let's see how it was achieved so first of all I need to retrieve that uh master key so it's done here in generate uh I'm going to close this so you can see more code um so you can see generate or retrieve master key from file so let's access this piece of code if the file does not exist then I create a new one and I save that master key to file just like this with a file output stream nothing really important nothing cool generate master key uh it's very simple actually to generate master key just like this you do a a new B array and with a random which is a new secure random in Java uh you do you use the next bite uh method with the size of the key uh with sorry with the um well with the sorry with the bite array initialized with the size of the key and you get a new um a new um customer uh master key which is the one you so in my um in my log file here right you can see that uh cool so that's the master key part that's how you get the master key and how you retrieve it from the file uh then from my connection helper so first thing here I create so once I have the master key I can create my connection Helper and here in my Constructor the first thing I do is I generate three things my qms provider my local qms provider my encryption client and my client the one that capable of sending documents and and reading and writing documents uh so generate the qms it's very simple it's just uh an ash map it's just a map like this so it's a map string which maps to a new map because you can actually use different type of uh qms in the same system so I can use local along with other things and I can also have several Keys um um in my um in my map here so just just like this so you need to create an an an object like this so it's a ash map with a an ash map in it and so for me it's local with key and points to finally my master key okay so that's my local KMS okay which is not good recommended for production environment okay next I can create my client encryption so create my client encryption it's not very complicated uh I'm going to unzoom a little so you can see as much as possible I need to uh provide the connection string for mongodb so here it's just mongodb Local Host then I need to so the client encryption is responsible for creating the data encryption keys and retrieving the data encryption keys right so uh this is done with well by providing the configuration for the key Vol okay if the key volt is stored in another place okay that's the place it's going to be stored you have the name space of the key volt and the KMS provider and you just build and that's it you just create the client like this right so once you have that client you can create the other one the client so this one is the one you are familiar with okay uh it's almost normal except that this time you have to pass an extra uh parameter which is the Auto encryption settings that's the usual settings you do to pass the connection string and you add this one here and this one here contains my volt again uh my qms provider and this option here which is very important the bypass Auto encryption which I talked about uh by saying that I say that I don't want to use uh the these tools right remember this automatic part with the Jen schema and the Crypt D or now the most recent version the automatic encryption Shard library right uh this is the option for mongodb Community Edition right so by this I mean that I'm going to use the uh the explicit encryption instead of the automated encryption okay so again I compile all this pass the options and create my client and that's all I need for now uh back to my code in the client field LEL encryption demo uh then I just clean the cluster so this function is just silly it's just drop the Vault and the encrypted uh collection so that that's it I'm just cleaning the whole cluster for the next you know next run so I'm clean for the next execution um okay next uh I can so retrieve my encryption my client so I have encryption and client client and looks like I can also retrieve my Vault and my user collection okay so here I'm just initializing my V my variables then I can create my index the unique index I need for the um the key volt so it's just done like this I'm going to get rid of this so you can see better and zoom a little more so you see I'm using key volt create index so I'm creating an ascending index on the key alt names you remember this is this this array here uh in the in the vault and I pass the option so the index is unique and with a partial filter on you know exist key alt names great then I can create my data encryption keys so here I just do this like that so I pass the encryption client and encryption client I can use the method called create data key which needs to pass the u local so I have to say which uh KMS I'm going to use so here it's the local one and I'm going to use uh to create a key for Bobby and create a key for Alice okay so here I'm just creating the key Al name uh which is just you know the uh an option for the key alt name which is a list of all the alternative names I can provide if I want to and just in here I just need Bobby and Alice okay and I just print them in the in the output and that's what you saw when executed this and by the way those IDs here so this ID here for example e19 something that's the one you find here e19 something that's your U ID that you find to identify uniquely your um data encryption key uh in your Vault okay next next I can insert my encrypted documents for Bobby and Alice so let's see what we have here so as you can see I'm creating two documents one for Bobby one for Alice I use my collection insert many function so I can insert a list of Bobby and Alice and retrieve the size and print uh number of document have been inserted which is what we see as well here two document have been inserted right um let's see how I create the document for Bobby for example and it's exactly the same SC for Alice it's actually both in the same screen um so for Bobby you can see for example for the phone I use the encryption client and I use the encrypt method this time and uh I create a new bus string which contains the phone number and I use the algorithm deterministic Bobby just so you so you can see deterministic it's an option new encryp encrypt options deterministic with the key art name of the key um so by this with this algorithm here with this option I say which key I want to use and which algorithm I want to use uh and so now we reached the part where I need to explain uh why we have deterministic for the phone number for example and random for the blood type so um this is an security layer uh just in case you want to prevent some sort of attack you can have an encrypted data let's say that I use a deterministic way to encrypt the blood type for example blood type there is a very limited cardinality right there is like I think eight different type of blood types in the for the humans you know so um if I use a deterministic algorithm and if I use the same key to encrypt all my documents which is not the case here by the way but just in case you use only one key to encrypt all the fields by default for the blood type it means A+ for everybody will be equal to the same value in all the documents so that means that if I have the frequency of you know the different blood types in a population then it means I can retrieve my database and do a frequency analysis of that field and I can just match the different fields where is oh so this field is 20% this field is 20% so this encrypted field much m match A+ for example and this is 15% so this must be B+ for example uh so this is a type of fre frequency attack that you can do when you have a low cardinality on um on a field like like blood type for example uh but not on a teleone number for example because this is very likely that all your telephone numbers in your database will be unique enough um so maybe not unique but unique enough so you cannot do a frequency attack on a telephone number for example same for a social security number for any type of identifier for a user for example an email or something like this they will all be almost different in your database enough that you cannot be attacked by this kind of attack uh so that's why I can use deterministic for the phone but I have to use random for the blood type just to be safe it also mean that because I use the random algorithm here for the blood type it means that I cannot um re query this document by blood type because if I try to reencrypt this blood type with the key from Bobby it means I'm going to get a different value and if I look for this value in the database it won't exist right so if I use a randomm algorithm I can only decrypt that value but I cannot reencrypt and recreate that value which which means I cannot query by that value um that being said there is a new thing that is being released currently uh in the most recent version of mongodb 7.0 CSF was first released in 4.2 so it's not new we are right now at Mong 7.0 as I'm shooting this video and there is a new uh thing which is queriable encryption so if you want to know more about this it's not part of this video but check out queriable encryption because you're going to learn a lot more about this topic it's a new way to encrypt and decrypt your Fields basically um cool and finally you have my medical entry so test H result bad for Bobby so of course I want to keep this very very safe and very very private so for this of course there is no point for me to query for example by by this field if I query Bobby's document I'm probably going to retrieve by its name or maybe his phone number definitely not going to look for Bobby by its blood type or it his medical um entry for example so um so here I encrypt again the field which is a list of elements the list of a bus document it's a sub document in my object if you don't remember I can show you uh the document in clear as you can see here it's an array of Doc of sub documents and have a test and result uh and some documents right so uh I encrypt all this with a random algorithm again and same for Alice she doesn't have a medical entry but that's fine she has a blood type and she has a phone number deterministic random same thing in the end when I retrieve those documents you can see they are printed here in clear for Alice and for for Bobby first and Alice second cool okay uh next so so now I have inserted the documents now I can read Bobby so how do I do that here I chose to query Bobby by his phone number so again I reencrypt the value provided by the client I re-encrypt with the same encryption key the same data encryption key and the same algorithm of course and then I can just run a normal user collection find equal phone equal to my encrypted value uh retrieve the first value uh print to Json and that's what you see in the console output uh so that's how you can query by an encrypted field using the explicit encryption from CSF uh cool and for Alice uh I went with a different direction so at least I have a try catch because the second time I'm going to try this it's going to fail that the final exception and error I have at the end because I don't have the key anymore so uh here I just do a normal find by at Le by by name and as you can see it works as well of course because her name is not um um encrypted in database right it's it's in clear text so I can just do a normal find query uh using mongodb um cool and finally uh to finish the example here uh I remove Alis key uh so data encryption key so for this I use uh the volt collection I do a delete one on uh the key Al name Alice and I count how many keys have been removed looks like a little typo here uh okay and of course her key was removed oops and so that's the part where there is a a c cash just so you know so we don't retrieve all the time all the data encryption keys because it would be uh it would be um expensive for the server and for the client to always do those queries so there is a cache on the client side because the data encryption keys don't change very very often once you create it it's supposed to stay for a while uh so uh I reset the cache here uh usually you don't need to do that in production it's just because it's a CD tutorial and I want to test that uh if I delete the key I cannot retrieve anymore the document that's what I'm doing here I'm resetting the connection and when I try again to read Alice document uh when I try to Define here the auto encryption the auto decryption algorithm fails because I'm missing the key from Alice and so it fails it goes to a mongodb exception and that error I get at the end in my uh final uh line in my console so that's it uh you have the whole example here uh it's very easy to execute uh the only thing you need to know is that the connection string is passed here at the top it's the very first line in the connection helper connection string and you can see that this is the connection string system property mongodb URI so if you look here at my configuration the way it's done this see is just using uh DD um so this is part of the um Maven options if I don't say anything silly here it's part of the VM parameters so DD mongodb equal mongodb colon loal slash okay so I'm just passing this like this of course here I could use mongodb Atlas I could use any instant of of mongodb I'm using Java 17 for this example and that's that's really all I need to to start this uh this example in the read me here um uh where is the REM me REM is here uh up no yes no like this in the rmy you have here all the different common lines so to start all the other um quick start from this uh repository and the one you are looking for for CSF is this one here at the bottom right so exact main class this and the DD mongodb equal this and here you have an example with mongodb Atlas it's if that's your your choice um cool and one last thing I want to show you uh in the pom.xml so the pom.xml here pretty simple I'm not using a lot of things I'm using the mongodb sync driver right and I told you about lib Crypt which is the companion Library which does all the encryption and decryption this part here is mandatory to do CSF okay it's part of mongod Community Edition you can import it and that's free to use in the Mong Community Edition uh again the part that is not free and not free to use unless you are using Mong Atlas or you have the Mong Enterprise Advance uh version is this part here right the automatic encryption shared Library um or this uh where is it I lost it or the Crypt D but if you're using Crypt D I highly recommend that you now try to transition and move to automatic encryption Shard Library uh so that's it I'm just using logb back there is a small log back configuration you can check out in the in the repository and and that's about it I have um few details here but nothing nothing major right that's really what I wanted to show here is this lib Crypt which is in the mongod db- Crypt artifact ID in Maven right cool well thank you very much uh if you listen to this entire video uh I really like presenting client side field LEL encryption to you today I hope you learn something um just so you know uh there is a second video that I'm shooting very soon for another repository that I did uh so here so this video here will be related to this blog post here which is available in the MB developer developer Center um and I created also a new blog post called uh how to implement client side field level encryption in Java with spring data mongodb and in this one we're using spring boot spring data and this uh blog post uh provides some code in this repository here um which is way more production ready in this one I am using the automatic encryption using this automatic encryption shared Library uh I'm not using cable encryption though but the code that I provide is way more production ready way more reusable I'm using J and schemas on the client side and the server side to ensure that the documents cannot be uh written without encryption and uh yeah so yeah and the future just a final note uh the future uh for CSF uh is with this um uh so that's the main documentation of CSF but the future is here right the Mong queriable encryption feature is available now in ga uh with mongodb 7.0 and letter so if you haven't please check it out and have a look to quable encryption which is part of CSF and kind of the future of CSF if you want my opinion cool so thank you very much for everybody that listened to this video uh have fun and see you in the next video thank you byebye