Client-Side Field Level Encryption (CSFLE) in MongoDB with Golang
Rate this tutorial
One of the many great things about MongoDB is how secure you can make
your data in it. In addition to network and user-based rules, you have
encryption of your data at rest, encryption over the wire, and now
recently, client-side encryption known as client-side field level
encryption (CSFLE).
So, what exactly is client-side field level encryption (CSFLE) and how
do you use it?
With field level encryption, you can choose to encrypt certain fields
within a document, client-side, while leaving other fields as plain
text. This is particularly useful because when viewing a CSFLE document
with the CLI,
Compass, or directly within
Altas, the encrypted fields will
not be human readable. When they are not human readable, if the
documents should get into the wrong hands, those fields will be useless
to the malicious user. However, when using the MongoDB language drivers
while using the same encryption keys, those fields can be decrypted and
are queryable within the application.
In this quick start themed tutorial, we're going to see how to use
MongoDB field level
encryption
with the Go programming language (Golang). In particular, we're going to
be exploring automatic encryption rather than manual encryption.
There are a few requirements that must be met prior to attempting to use
CSFLE with the Go driver.
- MongoDB Atlas 4.2+
- MongoDB Go driver 1.2+
This tutorial will focus on automatic encryption. While this tutorial
will use MongoDB Atlas, you're
going to need to be using version 4.2 or newer for MongoDB Atlas or
MongoDB Enterprise Edition. You will not be able to use automatic field
level encryption with MongoDB Community Edition.
The assumption is that you're familiar with developing Go applications
that use MongoDB. If you want a refresher, take a look at the quick
start
series
that I published on the topic.
To use field level encryption, you're going to need a little more than
just having an appropriate version of MongoDB and the MongoDB Go driver.
We'll need libmongocrypt, which is a companion library for
encryption in the MongoDB drivers, and mongocryptd, which is a
binary for parsing automatic encryption rules based on the extended JSON
format.
Because of the libmongocrypt and mongocryptd requirements, it's
worth reviewing how to install and configure them. We'll be exploring
installation on macOS, but refer to the documentation for
libmongocrypt and
mongocryptd
for your particular operating system.
There are a few solutions torward installing the libmongocrypt
library on macOS, the easiest being with Homebrew.
If you've got Homebrew installed, you can install libmongocrypt with
the following command:
Just like that, the MongoDB Go driver will be able to handle encryption.
Further explanation of the instructions can be found in the
documentation.
Because we want to do automatic encryption with the driver using an
extended JSON schema, we need mongocryptd, a binary that ships with
MongoDB Enterprise Edition. The mongocryptd binary needs to exist on
the computer or server where the Go application intends to run. It is
not a development dependency like libmongocrypt, but a runtime
dependency.
You'll want to consult the
documentation
on how to obtain the mongocryptd binary as each operating system has
different steps.
For macOS, you'll want to download MongoDB Enterprise Edition from the
MongoDB Download
Center.
You can refer to the Enterprise Edition installation
instructions
for macOS to install, but the gist of the installation involves
extracting the TAR file and moving the files to the appropriate
directory.
By this point, all the appropriate components for field level encryption
should be installed or available.
Before we can start encrypting and decrypting fields within our
documents, we need to establish keys to do the bulk of the work. This
means defining our key vault location within MongoDB and the Key
Management System (KMS) we wish to use for decrypting the data
encryption keys.
The key vault is a collection that we'll create within MongoDB for
storing encrypted keys for our document fields. The primary key within
the KMS will decrypt the keys within the key vault.
For this particular tutorial, we're going to use a Local Key Provider
for our KMS. It is worth looking into something like AWS
KMS or similar, something we'll explore in
a future tutorial, as an alternative to a Local Key Provider.
On your computer, create a new Go project with the following main.go
file:
You'll need to install the MongoDB Go driver to proceed. To learn how to
do this, take a moment to check out my previous tutorial titled Quick
Start: Golang & MongoDB - Starting and
Setup.
In the above code, we have a few variables defined as well as a few
functions. We're going to focus on the
kmsProviders
variable and the
createDataKey
function for this particular part of the tutorial.Take a look at the following
createDataKey
function:In the above
createDataKey
function, we are first connecting to
MongoDB. The MongoDB connection string is defined by the environment
variable ATLAS_URI
in the above code. While you could hard-code this
connection string or store it in a configuration file, for security
reasons, it makes a lot of sense to use environment variables instead.If the connection was successful, we need to define the key vault
namespace and the KMS provider as part of the encryption configuration
options. The namespace is composed of the database name followed by the
collection name. This is where the key information will be stored. The
kmsProviders
map, which will be defined later, will have local key
information.Executing the
CreateDataKey
function will create the key information
within MongoDB as a document.We are choosing to specify an alternate key name of
example
so that we
don't have to refer to the data key by its _id
when using it with our
documents. Instead, we'll be able to use the unique alternate name which
could follow a special naming convention. It is important to note that
the alternate key name is only useful when using the
AEAD_AES_256_CBC_HMAC_SHA_512-Random
, something we'll explore later in
this tutorial.To use the
createDataKey
function, we can make some modifications to
the main
function:In the above code, we are generating a random key. This random key is
added to the
kmsProviders
map that we were using within the
createDataKey
function.It is insecure to have your local key stored within the application or
on the same server. In production, consider using AWS KMS or accessing
your local key through a separate request before adding it to the Local
Key Provider.
If you ran the code so far, you'd end up with a
keyvault
database and
a datakeys
collection which has a document of a key with an alternate
name. That document would look something like this:There are a few important things to note with our code so far:
- The
localKey
is random and is not persisting beyond the runtime which will result in key mismatches upon consecutive runs of the application. Either specify a non-random key or store it somewhere after generation. - We're using a Local Key Provider with a key that exists locally. This is not recommended in a production scenario due to security concerns. Instead, use a provider like AWS KMS or store the key externally.
- The
createDataKey
should only be executed when a particular key is needed to be created, not every time the application runs. - There is no strict naming convention for the key vault and the keys that reside in it. Name your database and collection however makes sense to you.
After we run our application the first time, we'll probably want to
comment out the
createDataKey
line in the main
function.With the data key created, we're at a point in time where we need to
figure out what fields should be encrypted in a document and what fields
should be left as plain text. The easiest way to do this is with a
schema map.
A schema map for encryption is extended JSON and can be added directly
to the Go source code or loaded from an external file. From a
maintenance perspective, loading from an external file is easier to
maintain.
Take a look at the following schema map for encryption:
Let's assume the above JSON exists in a schema.json file which sits
relative to our Go files or binary. In the above JSON, we're saying that
the map applies to the
people
collection within the fle-example
database.The
keyId
field within the encryptMetadata
object says that
documents within the people
collection must have a string field called
keyAltName
. The value of this field will reflect the alternate key
name that we defined when creating the data key. Notice the /
that
prefixes the value. That is not an error. It is a requirement for this
particular value since it is a pointer.The
properties
field lists fields within our document and in this
example lists the fields that should be encrypted along with the
encryption algorithm to use. In our example, only the ssn
field will
be encrypted while all other fields will remain as plain text.There are two algorithms currently supported:
- AEAD_AES_256_CBC_HMAC_SHA_512-Random
- AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic
In short, the
AEAD_AES_256_CBC_HMAC_SHA_512-Random
algorithm is best
used on fields that have low cardinality or don't need to be used within
a filter for a query. The AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic
algorithm should be used for fields with high cardinality or for fields
that need to be used within a filter.To learn more about these algorithms, visit the
documentation.
We'll be exploring both algorithms in this particular tutorial.
If we wanted to, we could change the schema map to the following:
The change made in the above example has to do with the
keyId
field.
Rather than declaring it as part of the encryptMetadata
, we've
declared it as part of a particular field. This could be useful if you
want to use different keys for different fields.Remember, the pointer used for the
keyId
will only work with the
AEAD_AES_256_CBC_HMAC_SHA_512-Random
algorithm. You can, however, use
the actual key id for both algorithms.With a schema map for encryption available, let's get it loaded in the
Go application. Change the
readSchemaFromFile
function to look like
the following:In the above code, we are reading the file, which will be the
schema.json file soon enough. If it is read successfully, we use the
UnmarshalExtJSON
function to load it into a bson.M
object that is
more pleasant to work with in Go.By this point, you should have the code in place for creating a data key
and a schema map defined to be used with the automatic client encryption
functionality that MongoDB supports. It's time to bring it together to
actually encrypt and decrypt fields.
We're going to start with the
createEncryptedClient
function within
our project:In the above code we are making use of the
readSchemaFromFile
function
that we had just created to load our schema map for encryption. Next, we
are defining our auto encryption options and establishing a connection
to MongoDB. This will look somewhat familiar to what we did in the
createDataKey
function. When defining the auto encryption options, not
only are we specifying the KMS for our key and vault, but we're also
supplying the schema map for encryption.You'll notice that we are using
mongocryptdBypassSpawn
as an extra
option. We're doing this so that the client doesn't try to automatically
start the mongocryptd daemon if it is already running. You may or
may not want to use this in your own application.If the connection was successful, the client is returned.
It's time to revisit the
main
function within the project:In the above code, we are creating our Local Key Provider using a local
key that was randomly generated. Remember, this key should match what
was used when creating the data key, so random may not be the best
long-term. Likewise, a local key shouldn't be used in production because
of security reasons.
Once the KMS providers are established, the
createEncryptedClient
function is executed. Remember, this particular function will set the
automatic encryption options and establish a connection to MongoDB.To match the database and collection used in the schema map definition,
we are using
fle-example
as the database and people
as the
collection. The operations that follow, such as InsertOne
and
FindOne
, can be used as if field level encryption wasn't even a thing.
Because we have an ssn
field and the keyAltName
field, the ssn
field will be encrypted client-side and saved to MongoDB. When doing
lookup operation, the encrypted field will be decrypted.When looking at the data in Atlas, for example, the encrypted fields
will not be human readable as seen in the above screenshot.
When field level encryption is included in the Go application, a special
tag must be included in the build or run process, depending on the route
you choose. You should already have mongocryptd and
libmongocrypt, so to build your Go application, you'd do the
following:
If you use the above command to build your binary, you can use it as
normal. However, if you're running your application without building,
you can do something like the following:
The above command will run the application with client-side encryption
enabled.
If you've run the example so far, you'll probably notice that while you
can automatically encrypt fields and decrypt fields, you'll get an error
if you try to use a filter that contains an encrypted field.
In our example thus far, we use the
AEAD_AES_256_CBC_HMAC_SHA_512-Random
algorithm on our encrypted
fields. To be able to filter on encrypted fields, the
AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic
must be used. More
information between the two options can be found in the
documentation.To use the deterministic approach, we need to make a few revisions to
our project. These changes are a result of the fact that we won't be
able to use alternate key names within our schema map.
First, let's change the schema.json file to the following:
The two changes in the above JSON reflect the new algorithm and the
keyId
using the actual _id
value rather than an alias. For the
base64
field, notice the use of the %s
placeholder. If you know the
base64 string version of your key, then swap it out and save yourself a
bunch of work. Since this tutorial is an example and the data changes
pretty much every time we run it, we probably want to swap out that
field after the file is loaded.Starting with the
createDataKey
function, find the following line with
the CreateDataKey
function call:What we didn't see in the previous parts of this tutorial is that this
function returns the
_id
of the data key. We should probably update
our createDataKey
function to return primitive.Binary
and then
return that dataKeyId
variable.We need to move that
dataKeyId
value around until it reaches where we
load our JSON file. We're doing a lot of work for the following reasons:- We're in the scenario where we don't know the
_id
of our data key prior to runtime. If we know it, we can add it to the schema and be done. - We designed our code to jump around with functions.
The schema map requires a base64 value to be used, so when we pass
around
dataKeyId
, we need to have first encoded it.In the
main
function, we might have something that looks like this:This means that the
createEncryptedClient
needs to receive a string
argument. Update the createEncryptedClient
to accept a string and then
change how we're reading our JSON file:Remember, we're just passing the base64 encoded value through the
pipeline. By the end of this, in the
readSchemaFromFile
function, we
can update our code to look like the following:Not only are we receiving the base64 string, but we are using an
Sprintf
function to swap our %s
placeholder with the actual value.Again, these changes were based around how we designed our code. At the
end of the day, we were really only changing the
keyId
in the schema
map and the algorithm used for encryption. By doing this, we are not
only able to decrypt fields that had been encrypted, but we're also able
to filter for documents using encrypted fields.While it might seem like we wrote a lot of code, the reality is that the
code was far simpler than the concepts involved. To get a better look at
the code, you can find it below:
Try to set the
ATLAS_URI
in your environment variables and give the
code a spin.If you ran the above code and found some encrypted data in your
database, fantastic! However, if you didn't get so lucky, I want to
address a few of the common problems that come up.
Let's start with the following runtime error:
If you see the above error, it is likely because you forgot to use the
-tags cse
flag when building or running your application. To get
beyond this, just build your application with the following:Assuming there aren't other problems, you won't receive that error
anymore.
When you build or run with the
-tags cse
flag, you might stumble upon
the following error:The error might not look exactly the same as mine depending on the
operating system you're using, but the gist of it is that it's saying
you are missing the libmongocrypt library. Make sure that you've
installed it correctly for your operating system per the
documentation.
Now, what if you encounter the following?
Like with the libmongocrypt error, it just means that we don't have
access to mongocryptd, a requirement for automatic field level
encryption. There are numerous methods toward installing this binary, as
seen in the
documentation,
but on macOS it means having MongoDB Enterprise Edition nearby.
You just saw how to use MongoDB client-side field level encryption
(CSFLE) in your Go application. This is useful if you'd like to encrypt
fields within MongoDB documents client-side before it reaches the
database.
To give credit where credit is due, a lot of the code from this tutorial
was taken from Kenn White's sandbox
repository
on GitHub.
There are a few things that I want to reiterate:
- Using a local key is a security risk in production. Either use something like AWS KMS or load your Local Key Provider with a key that was obtained through an external request.
- The mongocryptd binary must be available on the computer or server running the Go application. This is easily installed through the MongoDB Enterprise Edition installation.
- The libmongocrypt library must be available to add compatibility to the Go driver for client-side encryption and decryption.
- Don't lose your client-side key. Otherwise, you lose the ability to decrypt your fields.
In a future tutorial, we'll explore how to use AWS KMS and similar for
key management.
Questions? Comments? We'd love to connect with you. Join the
conversation on the MongoDB Community
Forums.