Client-Side Field Level Encryption (CSFLE) in MongoDB with Golang
Rate this tutorial
One of the many great things about MongoDB is how secure you can make your data in it. In addition to network and user-based rules, you have encryption of your data at rest, encryption over the wire, and now recently, client-side encryption known as client-side field level encryption (CSFLE).
So, what exactly is client-side field level encryption (CSFLE) and how do you use it?
With field level encryption, you can choose to encrypt certain fields within a document, client-side, while leaving other fields as plain text. This is particularly useful because when viewing a CSFLE document with the CLI, Compass, or directly within Altas, the encrypted fields will not be human readable. When they are not human readable, if the documents should get into the wrong hands, those fields will be useless to the malicious user. However, when using the MongoDB language drivers while using the same encryption keys, those fields can be decrypted and are queryable within the application.
In this quick start themed tutorial, we're going to see how to use MongoDB field level encryption with the Go programming language (Golang). In particular, we're going to be exploring automatic encryption rather than manual encryption.
There are a few requirements that must be met prior to attempting to use CSFLE with the Go driver.
- MongoDB Atlas 4.2+
- MongoDB Go driver 1.2+
- The libmongocrypt library installed
- The mongocryptd binary installed
This tutorial will focus on automatic encryption. While this tutorial will use MongoDB Atlas, you're going to need to be using version 4.2 or newer for MongoDB Atlas or MongoDB Enterprise Edition. You will not be able to use automatic field level encryption with MongoDB Community Edition.
The assumption is that you're familiar with developing Go applications that use MongoDB. If you want a refresher, take a look at the quick start series that I published on the topic.
To use field level encryption, you're going to need a little more than just having an appropriate version of MongoDB and the MongoDB Go driver. We'll need libmongocrypt, which is a companion library for encryption in the MongoDB drivers, and mongocryptd, which is a binary for parsing automatic encryption rules based on the extended JSON format.
Because of the libmongocrypt and mongocryptd requirements, it's worth reviewing how to install and configure them. We'll be exploring installation on macOS, but refer to the documentation for libmongocrypt and mongocryptd for your particular operating system.
There are a few solutions torward installing the libmongocrypt library on macOS, the easiest being with Homebrew. If you've got Homebrew installed, you can install libmongocrypt with the following command:
Just like that, the MongoDB Go driver will be able to handle encryption. Further explanation of the instructions can be found in the documentation.
Because we want to do automatic encryption with the driver using an extended JSON schema, we need mongocryptd, a binary that ships with MongoDB Enterprise Edition. The mongocryptd binary needs to exist on the computer or server where the Go application intends to run. It is not a development dependency like libmongocrypt, but a runtime dependency.
You'll want to consult the documentation on how to obtain the mongocryptd binary as each operating system has different steps.
For macOS, you'll want to download MongoDB Enterprise Edition from the MongoDB Download Center. You can refer to the Enterprise Edition installation instructions for macOS to install, but the gist of the installation involves extracting the TAR file and moving the files to the appropriate directory.
By this point, all the appropriate components for field level encryption should be installed or available.
Before we can start encrypting and decrypting fields within our documents, we need to establish keys to do the bulk of the work. This means defining our key vault location within MongoDB and the Key Management System (KMS) we wish to use for decrypting the data encryption keys.
The key vault is a collection that we'll create within MongoDB for storing encrypted keys for our document fields. The primary key within the KMS will decrypt the keys within the key vault.
For this particular tutorial, we're going to use a Local Key Provider for our KMS. It is worth looking into something like AWS KMS or similar, something we'll explore in a future tutorial, as an alternative to a Local Key Provider.
On your computer, create a new Go project with the following main.go file:
You'll need to install the MongoDB Go driver to proceed. To learn how to do this, take a moment to check out my previous tutorial titled Quick Start: Golang & MongoDB - Starting and Setup.
In the above code, we have a few variables defined as well as a few functions. We're going to focus on the
kmsProvidersvariable and the
createDataKeyfunction for this particular part of the tutorial.
Take a look at the following
In the above
createDataKeyfunction, we are first connecting to MongoDB. The MongoDB connection string is defined by the environment variable
ATLAS_URIin the above code. While you could hard-code this connection string or store it in a configuration file, for security reasons, it makes a lot of sense to use environment variables instead.
If the connection was successful, we need to define the key vault namespace and the KMS provider as part of the encryption configuration options. The namespace is composed of the database name followed by the collection name. This is where the key information will be stored. The
kmsProvidersmap, which will be defined later, will have local key information.
CreateDataKeyfunction will create the key information within MongoDB as a document.
We are choosing to specify an alternate key name of
exampleso that we don't have to refer to the data key by its
_idwhen using it with our documents. Instead, we'll be able to use the unique alternate name which could follow a special naming convention. It is important to note that the alternate key name is only useful when using the
AEAD_AES_256_CBC_HMAC_SHA_512-Random, something we'll explore later in this tutorial.
To use the
createDataKeyfunction, we can make some modifications to the
In the above code, we are generating a random key. This random key is added to the
kmsProvidersmap that we were using within the
It is insecure to have your local key stored within the application or on the same server. In production, consider using AWS KMS or accessing your local key through a separate request before adding it to the Local Key Provider.
If you ran the code so far, you'd end up with a
keyvaultdatabase and a
datakeyscollection which has a document of a key with an alternate name. That document would look something like this:
There are a few important things to note with our code so far:
localKeyis random and is not persisting beyond the runtime which will result in key mismatches upon consecutive runs of the application. Either specify a non-random key or store it somewhere after generation.
- We're using a Local Key Provider with a key that exists locally. This is not recommended in a production scenario due to security concerns. Instead, use a provider like AWS KMS or store the key externally.
createDataKeyshould only be executed when a particular key is needed to be created, not every time the application runs.
- There is no strict naming convention for the key vault and the keys that reside in it. Name your database and collection however makes sense to you.
After we run our application the first time, we'll probably want to comment out the
createDataKeyline in the
With the data key created, we're at a point in time where we need to figure out what fields should be encrypted in a document and what fields should be left as plain text. The easiest way to do this is with a schema map.
A schema map for encryption is extended JSON and can be added directly to the Go source code or loaded from an external file. From a maintenance perspective, loading from an external file is easier to maintain.
Take a look at the following schema map for encryption:
Let's assume the above JSON exists in a schema.json file which sits relative to our Go files or binary. In the above JSON, we're saying that the map applies to the
peoplecollection within the
keyIdfield within the
encryptMetadataobject says that documents within the
peoplecollection must have a string field called
keyAltName. The value of this field will reflect the alternate key name that we defined when creating the data key. Notice the
/that prefixes the value. That is not an error. It is a requirement for this particular value since it is a pointer.
propertiesfield lists fields within our document and in this example lists the fields that should be encrypted along with the encryption algorithm to use. In our example, only the
ssnfield will be encrypted while all other fields will remain as plain text.
There are two algorithms currently supported:
In short, the
AEAD_AES_256_CBC_HMAC_SHA_512-Randomalgorithm is best used on fields that have low cardinality or don't need to be used within a filter for a query. The
AEAD_AES_256_CBC_HMAC_SHA_512-Deterministicalgorithm should be used for fields with high cardinality or for fields that need to be used within a filter.
To learn more about these algorithms, visit the documentation. We'll be exploring both algorithms in this particular tutorial.
If we wanted to, we could change the schema map to the following:
The change made in the above example has to do with the
keyIdfield. Rather than declaring it as part of the
encryptMetadata, we've declared it as part of a particular field. This could be useful if you want to use different keys for different fields.
Remember, the pointer used for the
keyIdwill only work with the
AEAD_AES_256_CBC_HMAC_SHA_512-Randomalgorithm. You can, however, use the actual key id for both algorithms.
With a schema map for encryption available, let's get it loaded in the Go application. Change the
readSchemaFromFilefunction to look like the following:
In the above code, we are reading the file, which will be the schema.json file soon enough. If it is read successfully, we use the
UnmarshalExtJSONfunction to load it into a
bson.Mobject that is more pleasant to work with in Go.
By this point, you should have the code in place for creating a data key and a schema map defined to be used with the automatic client encryption functionality that MongoDB supports. It's time to bring it together to actually encrypt and decrypt fields.
We're going to start with the
createEncryptedClientfunction within our project:
In the above code we are making use of the
readSchemaFromFilefunction that we had just created to load our schema map for encryption. Next, we are defining our auto encryption options and establishing a connection to MongoDB. This will look somewhat familiar to what we did in the
createDataKeyfunction. When defining the auto encryption options, not only are we specifying the KMS for our key and vault, but we're also supplying the schema map for encryption.
You'll notice that we are using
mongocryptdBypassSpawnas an extra option. We're doing this so that the client doesn't try to automatically start the mongocryptd daemon if it is already running. You may or may not want to use this in your own application.
If the connection was successful, the client is returned.
It's time to revisit the
mainfunction within the project:
In the above code, we are creating our Local Key Provider using a local key that was randomly generated. Remember, this key should match what was used when creating the data key, so random may not be the best long-term. Likewise, a local key shouldn't be used in production because of security reasons.
Once the KMS providers are established, the
createEncryptedClientfunction is executed. Remember, this particular function will set the automatic encryption options and establish a connection to MongoDB.
To match the database and collection used in the schema map definition, we are using
fle-exampleas the database and
peopleas the collection. The operations that follow, such as
FindOne, can be used as if field level encryption wasn't even a thing. Because we have an
ssnfield and the
ssnfield will be encrypted client-side and saved to MongoDB. When doing lookup operation, the encrypted field will be decrypted.
When looking at the data in Atlas, for example, the encrypted fields will not be human readable as seen in the above screenshot.
When field level encryption is included in the Go application, a special tag must be included in the build or run process, depending on the route you choose. You should already have mongocryptd and libmongocrypt, so to build your Go application, you'd do the following:
If you use the above command to build your binary, you can use it as normal. However, if you're running your application without building, you can do something like the following:
The above command will run the application with client-side encryption enabled.
If you've run the example so far, you'll probably notice that while you can automatically encrypt fields and decrypt fields, you'll get an error if you try to use a filter that contains an encrypted field.
In our example thus far, we use the
AEAD_AES_256_CBC_HMAC_SHA_512-Randomalgorithm on our encrypted fields. To be able to filter on encrypted fields, the
AEAD_AES_256_CBC_HMAC_SHA_512-Deterministicmust be used. More information between the two options can be found in the documentation.
To use the deterministic approach, we need to make a few revisions to our project. These changes are a result of the fact that we won't be able to use alternate key names within our schema map.
First, let's change the schema.json file to the following:
The two changes in the above JSON reflect the new algorithm and the
keyIdusing the actual
_idvalue rather than an alias. For the
base64field, notice the use of the
%splaceholder. If you know the base64 string version of your key, then swap it out and save yourself a bunch of work. Since this tutorial is an example and the data changes pretty much every time we run it, we probably want to swap out that field after the file is loaded.
Starting with the
createDataKeyfunction, find the following line with the
What we didn't see in the previous parts of this tutorial is that this function returns the
_idof the data key. We should probably update our
createDataKeyfunction to return
primitive.Binaryand then return that
We need to move that
dataKeyIdvalue around until it reaches where we load our JSON file. We're doing a lot of work for the following reasons:
- We're in the scenario where we don't know the
_idof our data key prior to runtime. If we know it, we can add it to the schema and be done.
- We designed our code to jump around with functions.
The schema map requires a base64 value to be used, so when we pass around
dataKeyId, we need to have first encoded it.
mainfunction, we might have something that looks like this:
This means that the
createEncryptedClientneeds to receive a string argument. Update the
createEncryptedClientto accept a string and then change how we're reading our JSON file:
Remember, we're just passing the base64 encoded value through the pipeline. By the end of this, in the
readSchemaFromFilefunction, we can update our code to look like the following:
Not only are we receiving the base64 string, but we are using an
Sprintffunction to swap our
%splaceholder with the actual value.
Again, these changes were based around how we designed our code. At the end of the day, we were really only changing the
keyIdin the schema map and the algorithm used for encryption. By doing this, we are not only able to decrypt fields that had been encrypted, but we're also able to filter for documents using encrypted fields.
While it might seem like we wrote a lot of code, the reality is that the code was far simpler than the concepts involved. To get a better look at the code, you can find it below:
Try to set the
ATLAS_URIin your environment variables and give the code a spin.
If you ran the above code and found some encrypted data in your database, fantastic! However, if you didn't get so lucky, I want to address a few of the common problems that come up.
Let's start with the following runtime error:
If you see the above error, it is likely because you forgot to use the
-tags cseflag when building or running your application. To get beyond this, just build your application with the following:
Assuming there aren't other problems, you won't receive that error anymore.
When you build or run with the
-tags cseflag, you might stumble upon the following error:
The error might not look exactly the same as mine depending on the operating system you're using, but the gist of it is that it's saying you are missing the libmongocrypt library. Make sure that you've installed it correctly for your operating system per the documentation.
Now, what if you encounter the following?
Like with the libmongocrypt error, it just means that we don't have access to mongocryptd, a requirement for automatic field level encryption. There are numerous methods toward installing this binary, as seen in the documentation, but on macOS it means having MongoDB Enterprise Edition nearby.
You just saw how to use MongoDB client-side field level encryption (CSFLE) in your Go application. This is useful if you'd like to encrypt fields within MongoDB documents client-side before it reaches the database.
To give credit where credit is due, a lot of the code from this tutorial was taken from Kenn White's sandbox repository on GitHub.
There are a few things that I want to reiterate:
- Using a local key is a security risk in production. Either use something like AWS KMS or load your Local Key Provider with a key that was obtained through an external request.
- The mongocryptd binary must be available on the computer or server running the Go application. This is easily installed through the MongoDB Enterprise Edition installation.
- The libmongocrypt library must be available to add compatibility to the Go driver for client-side encryption and decryption.
- Don't lose your client-side key. Otherwise, you lose the ability to decrypt your fields.
In a future tutorial, we'll explore how to use AWS KMS and similar for key management.
Questions? Comments? We'd love to connect with you. Join the conversation on the MongoDB Community Forums.