Hi Wyatt. Great question! What’s important to understand, as you point out, is that it’s not the field level encryption (FLE) data encryption keys (DEKs) that reside in the database, it’s the encrypted DEKs that are stored. In fact, at no point do either plaintext field data or raw/plaintext field keys get revealed to the database (and by extension, the DBA, the VM owner, or the infrastructure/cloud provider) for data encrypted using FLE. Which means that even with a backup, a DBA would have to have both the full snapshot of the database containing the deleted key and access to that application user’s specific master key, or more likely, IAM access to make KEK decrypt requests via KMS.
If the concern is about DBAs or some other party that does have access to both the decrypted keys and the backups, there are a couple of possible solutions depending on your threat model. If the primary motivation is just to provably ensure that deleted plaintext user records remain deleted no matter what, then it becomes a simple timing and separation of concerns strategy, and the most straight-forward solution is to move the keyvault collection to a different database or cluster completely, configured with a much shorter backup retention; FLE does not assume your encrypted keyvault collection is co-resident with your active cluster or has the same access controls and backup history, just that the client can, when needed, make an authenticated connection to that keyvault database. Important to note though that with a shorter backup cycle, in the event of some catastrophic data corruption (malicious, intentional, or accidental), all keys for that db (and therefore all encrypted data) are only as recoverable to the point in time as the shorter keyvault backup would restore.
Note also if you are using KMS, IAM policies can be granted that enforce IP allow-lists (e.g., initially scoped strictly to your production app server VLAN) and even potentially set to require MFA for decrypt operations on a per-IAM human user/role basis, and CloudTrail triggers can be set to alert to non-common events.
If the concern is about potential insider attacks from database administrators, then it may make sense to consider segregating responsibilities such that DBAs have no access to production IAM KMS accounts (or, alternatively a secrets manager like Hashicorp Vault if self-managing master keys) and thus no ability to recover any plaintext FLE-protected data.
It’s also possible to use multiple master keys, though we wouldn’t recommend this on a per-application user/per-document basis, where more than a small number were used.
Lastly, I should point out that there’s nothing specific to Atlas in all of the above - Atlas is oblivious as to whether or not FLE has been enabled, and in fact, short of manually scanning for use of BinData subtype 6 records or the presence of server-side FLE-specific json schema validation, I’m not sure how one could even determine that FLE is running. All of that to say that, no, there’s no baseline difference in how Altas handles FLE-enabled cluster backups. That said, one major advantage of running with Atlas (besides all the other benefits of a fully managed global service) is that you get automatic transparent encryption, e.g., full access to the cryptd package for mongocryptd.
I hope that helps. Feel free to reach out any time to me or my colleagues here if you have any questions.
p.s. Some other resources that might be useful for you:
Atlas Security whitepaper (which covers some of the internals of FLE keys):
Official FLE docs (updated regularly):
The “MedCo” step by step tutorial for FLE, with full examples in 5 languages:
A talk I gave at .Live this year on the FLE architecture:
Guide to MongoDB Client-Side Field Level Encryption:
A (very unofficial) FLE Sandbox quick-start for lots of different platforms & languages; also includes guidance on scoping KMS IAM policies:
Recent post from DevHub, a short tutorial on using FLE with Go (golang)