GDPR: Impact to Your Data Management Landscape: Part 3
Welcome to part 3 of our 4-part blog series.
- In part 1, we provided a primer into the GDPR – covering its rationale, and key measures
- In part 2, we explored what the GDPR means for your data platform
- In today’s part 3, we’ll discuss how MongoDB’s products and services can support you in your path to compliance
- Finally, in part 4, we’ll examine how the GDPR can help in customer experience, and provide a couple of case studies. .
You can also download the complete GDPR: Impact to Your Data Management Landscape white paper.
How MongoDB Can Help Meet GDPR Requirements
While data protection regulations such as GDPR, HIPAA, PCI-DSS, and others stipulate requirements that are unique to specific regions, industries or applications, there are foundational requirements common across all of the directives, including:
- Restricting data access, enforced via predefined privileges and roles
- Measures to protect against the accidental or malicious disclosure, loss, destruction, or damage of personal data
- The separation of duties when accessing and processing data
- Recording user, administrative staff, and application activities with a database
Figure 1: MongoDB End to End Security Architecture
These requirements inform the security architecture of MongoDB, with best practices for the implementation of a secure, compliant data management platform.
Using the advanced security features available in MongoDB Enterprise Advanced and the MongoDB Atlas cloud database service, organizations have extensive capabilities to implement the data discovery, defense, and detection requirements demanded by the GDPR.**Table 1**: Mapping GDPR requirements to MongoDB Enterprise Advanced capabilities
Identification of Personal Data
There are multiple ways to inspect database content. The most common method is to query the database and extract all records to identify the tables and rows (collections and documents, in MongoDB terminology) containing user data. However, this approach also requires significant manual analysis of the schema to track what data is stored, and where, while imposing processing overhead on the database itself.
MongoDB provides a much simpler approach with Compass, the GUI for MongoDB. Compass enables users to visually explore their data, providing a graphical view of their MongoDB schema by sampling a subset of documents from a collection, thereby minimizing database overhead and presenting results to the user almost instantly.
Schema visualization with MongoDB Compass enables the user to quickly explore their schema to understand the frequency, types, and ranges of fields in each data set. The user doesn’t need to be conversant with the MongoDB query language – powerful ad-hoc queries can be constructed through a point and click interface, opening up the discovery and data loss prevention process beyond developers and DBAs to Data Protection Officers and other business users.
Beyond Compass, the MongoDB query language and rich secondary indexes enable users to query and analyze the data in multiple ways. Data can be accessed by single keys, ranges, text search, graph, and geospatial queries through to complex aggregations, returning responses in milliseconds. Data can be dynamically enriched with elements such as user identity, location, and last access time to add context to Personally Identifiable Information (PII) , providing behavioral insights and actionable customer intelligence. Complex queries are executed natively in the database without having to use additional analytics frameworks or tools, and avoiding the latency that comes from ETL processes that are necessary to move data between operational and analytical systems in legacy enterprise architectures.
Retention of Personal Data
Through the use of the special-purpose TTL (Time-To-Live) index, administrators can automate the expiration of EU citizen data from a database. By configuring the required retention period against a date field in the document (i.e. the date on which the user data was collected or last accessed), MongoDB will delete the document once the period has been reached, using an automated background process that runs against the database every 60 seconds.
Compared to implementing expiration code at the application level, which must then regularly scan the database to find records that need to be deleted, the MongoDB TTL index dramatically simplifies the enforcement of data expiration policies. It also imposes significantly lower database overhead.
Access control to a database can be separated into two distinct stages:
- Authentication, designed to confirm the identity of clients accessing the database.
- Authorization, governing what that client is entitled to do once they have access to the database, such as reading data, writing data, performing administrative and maintenance activities, and more.
MongoDB provides multiple authentication methods, allowing the approach best suited to meet the requirements of different environments. Authentication can be managed from the database itself, or through integration with external authentication mechanisms.
MongoDB Atlas enforces in-database authentication via the SCRAM IETF RFC 5802 standard. As the MongoDB Atlas service runs on public cloud platforms, it also implements additional security controls to reduce the risk of unauthorized access. An Atlas cluster by default will disallow direct access from the internet. Each Atlas cluster is deployed within a virtual private environment (e.g., AWS or GCP Virtual Private Cloud, Azure Virtual Network), and that private environment is by default configured to allow no inbound access. Also IP whitelisting can be used to restrict network access to a database (i.e., application servers are prevented from accessing the database unless their IP address has been added to the whitelist for the appropriate MongoDB Atlas group) The Atlas AWS VPC peering option allows peering an organization’s Atlas network to its own AWS VPC network, thereby ensuring network traffic never traverses the public internet, and instead uses the internal private network.
MongoDB Enterprise Advanced also allows SCRAM authentication, with additional integration options for LDAP, Kerberos, or x.509 PKI certificates.
LDAP is widely used by many organizations to standardize and simplify the way large numbers of users are managed across internal systems and applications. In many cases, LDAP is also used as the centralized authority for user access control to ensure that internal security policies are compliant with corporate and regulatory guidelines. With LDAP integration, MongoDB Enterprise Advanced can both authenticate and authorize users directly against existing LDAP infrastructure to leverage centralised access control architectures.
MongoDB Enterprise Advanced also supports authentication using a Kerberos service. Through LDAP and Kerberos, MongoDB Enterprise Advanced provides support for authentication using Microsoft Active Directory. The Active Directory domain controller authenticates MongoDB users and servers running in a Windows network, again to leverage centralised access control.
With support for x.509 certificates MongoDB also can be integrated with Certificate Authorities (CA), supporting both user and inter-node cryptographic authentication, reducing risks found in passwords or keyfiles.
Review the Authentication section of the documentation to learn more about the different mechanisms available in MongoDB Enterprise Advanced.
Over ten predefined roles supporting common user and administrator database privileges provide Role Based Access Control (RBAC) capabilities. With MongoDB Enterprise Advanced, these can be further customised through User Defined Roles, enabling administrators to assign fine-grained privileges to clients, based on their respective data access and processing needs. To simplify account provisioning and maintenance, roles can be delegated across teams, ensuring the enforcement of consistent policies across specific data processing functions within the organization.
MongoDB Enterprise Advanced also supports authorization via LDAP, in addition to authentication discussed above. This enables existing user privileges stored in a LDAP server to be mapped to MongoDB roles, without recreating users in MongoDB itself. This integration strengthens and simplifies access control by enforcing centralised processes.
Review the Authorization section of the documentation to learn more about role-based access control in MongoDB.
Pseudonymisation & Encryption
As discussed in part 2, pseudonymisation and encryption of data is designed to prevent the identification of any specific individual in the event of data being accessed by an unauthorized party.
MongoDB provides multiple levels of pseudonymisation. Through read-only views, MongoDB can automatically filter out specific fields, such as those containing PII of citizens when a database is queried. Rather than query collections directly, clients can be granted access only to specific, predefined views of the data. Permissions granted against the view are specified separately from permissions granted to the underlying collection, and so clients with different access privileges can be granted different views of the data.
Read-only views allow the inclusion or exclusion of fields, masking of field values, filtering, schema transformation, grouping, sorting, limiting, and joining of data across multiple collections. Read-only views are transparent to the application accessing the data, and do not modify the underlying raw data in any way.
MongoDB Enterprise Advanced can also be configured with log redaction to prevent potentially sensitive information, such as personal identifiers, from being written to the database’s diagnostic log. Developers and DBAs who may need to access the logs for database performance optimization or maintenance tasks still get visibility to metadata, such as error or operation codes, line numbers, and source file names, but are unable to see any personal data associated with database events.
Encryption can protect data in transit and at rest, enabling only authorized access. Should unauthorized users gain access to a network, server, filesystem or database the data still can be protected with encryption keys.
Support for Transport Layer Security (TLS) allows clients to connect to MongoDB over an encrypted network channel, protecting data in transit. In addition, MongoDB encrypts data at rest in persistent storage and in backups.
Using the MongoDB Atlas managed database service, TLS is the default and cannot be disabled. Traffic from clients to Atlas, and between Atlas cluster nodes, is authenticated and encrypted. Encryption-at-rest is an available, no-cost option for customers using the public cloud providers’ disk and volume encryption services.
MongoDB Enterprise Advanced also offers the Encrypted Storage Engine, making the protection of data at-rest an integral feature of the database. By natively encrypting database files on disk, administrators reduce both the management and performance overhead of external encryption options, while providing an additional level of defense. Only those staff with the appropriate database credentials can access encrypted personal data. Access to the database file on the server would not expose any stored personal information.
The storage engine encrypts each database with a separate key. MongoDB recommends encryption keys be rotated and replaced at regular intervals, and by performing rolling restarts of the replica set, keys can be rotated without database downtime. Database files themselves do not need to be re-encrypted when using a Key Management Interoperability Protocol (KMIP) service, thereby also avoiding the performance overhead incurred with key rotation.
Refer to the documentation to learn more about encryption in MongoDB. Resilience and Disaster Recovery To protect service availability and recover from events that cause data corruption or loss, MongoDB offers fault tolerance to systems failures, along with backup and recovery tools for disaster recovery.
Using native replication, MongoDB maintains multiple copies of data in what are called replica sets. A replica set is a fully self-healing cluster distributed across multiple nodes to eliminate single points of failure. In the event a node fails, replica failover is fully automated, eliminating the need for administrators to intervene manually to restore database availability.
The number of replicas in a MongoDB replica set is configurable: a larger number of replicas will provide increased data availability and protection against database downtime (e.g., in case of multiple machine failures, rack failures, data center failures, or network partitions). Replica sets also provide operational flexibility by providing a way to upgrade hardware and software without requiring the database to be taken offline. Replica set members can be deployed both within and across physical data centers and cloud regions, providing resilience to regional failures.
Data can be compromised by a number of unforeseen events: failure of the database or its underlying infrastructure, user error, malicious activity, or application bugs. With a backup and recovery strategy in place, administrators can restore business operations by quickly recovering their data, enabling the organization to meet regulatory and compliance obligations.
The operational tooling provided as part of MongoDB Enterprise Advanced and the MongoDB Atlas managed database service can continuously maintain database backups for you. If MongoDB experiences a failure, the most recent backup is only moments behind the operational system, minimizing exposure to data loss. The tooling offers point-in-time recovery of replica sets and cluster-wide snapshots of sharded clusters. These operations can be performed without any interruption to database service. Administrators can restore the database to precisely the moment needed, quickly and safely. Automation-driven restores allow a fully configured cluster to be re-deployed directly from the database snapshots in a just few clicks, speeding time to service recovery.
You can learn more about backup and restore in MongoDB Enterprise Advanced from the Ops Manager documentation, and from the documentation for MongoDB Atlas.
Data Sovereignty: Data Transfers Outside of the EU
To support data sovereignty requirements, MongoDB zones allow precise control over where personal data is physically stored in a cluster. Zones are also the basis for Atlas's fully managed Global Clusters. Clusters can be configured to automatically “shard” (partition) the data based on the user’s location – enabling administrators to isolate EU citizen data to physical facilities located only in those regions recognised as complying with the GDPR. If EU policies towards storing data in specific regions change, updating the shard key range can enable the database automatically to move personal data to alternative regions. You can learn about Global Clusters from the Atlas documentation.
Proactive monitoring of all components within an application platform is always a best practice. System performance and availability depend on the timely detection and resolution of potential issues before they present problems to users. Sudden and unexpected peaks in memory and CPU utilization can, among other factors, be indicative of an attack, which can be mitigated if administrators are alerted in real time.
The operational tooling provided with MongoDB Enterprise Advanced and the MongoDB Atlas managed database service provide deep operational visibility into database operations. Featuring charts, custom dashboards, and automated alerting, MongoDB’s operational tooling tracks 100+ key database and systems health metrics including operations counters, memory, CPU, and storage consumption, replication and node status, open connections, queues, and many more. The metrics are securely reported to a management UI where they are processed, aggregated, alerted, and visualized in a browser, letting administrators easily track the health of MongoDB in real time. Metrics can also be pushed to Application Performance Management platforms such as AppDynamics and New Relic, supporting centralised visibility into the global IT estate.
Custom alerts can be generated when key metrics are out of range. These alerts can be sent via SMS and email, or integrated into existing incident management and collaboration systems such as PagerDuty, Slack, HipChat, and others to proactively warn of potential issues and help prevent outages or breaches.
The operational tooling also enables administrators to roll out upgrades and patches to the database without application downtime. Using the MongoDB Atlas database service, patches are automatically applied, removing the overhead of manual operator intervention.
By maintaining audit trails, changes to personal data and database configuration can be captured for each client accessing the database, providing a log for compliance and forensic analysis by data controllers and supervisory authorities.
The MongoDB Enterprise Advanced auditing framework logs all access and actions executed against the database, including:
- Administrative actions such as adding, modifying, and removing database users, schema operations, and backups.
- Authentication and authorization activities, including failed attempts at accessing personal data.
- Read and write operations to the database.
Administrators can construct and filter audit trails for any operation against MongoDB Enterprise Advanced. They can capture all activities, or just a subset of actions, based on the requirements stipulated by the data controller and auditors. For example, it is possible to log and audit the identities of users who accessed specific documents, and any changes they made to the database during their session. Learn more from the MongoDB Enterprise Advanced auditing documentation.
The MongoDB Atlas managed database service provides an audit log of administrative actions, such as the deployment and scaling of clusters, and addition or removal of users from an Atlas group. Database log access is also provided that can be used by controllers to track user connections to the database.
Services to Help Your Teams Create a Secure Database Environment
The GDPR text explicitly states the requirement for training in the text “Binding Corporate Rules”, Article 47 (clause 2n)
“the appropriate data protection training to personnel having permanent or regular access to personal data.“
MongoDB provides extensive training and consulting services to help customers apply best security practices:
- The MongoDB Security course is a no-cost, 3-week online training program delivered by MongoDB University.
- MongoDB University also offers a range of both public and private training for developers and operations teams, covering best practices in using and administering MongoDB.
- MongoDB Global Consulting Services offer a range of packages covering Health Checks, Production Readiness Assessments, and access to Dedicated Consulting Engineers. The MongoDB consulting engineers work directly with your teams to guide development and operations, ensuring skills transfer to your staff.
Wrapping Up Part 3
That wraps up the third part of our 4-part blog series. In Part 4, we’ll examine how the GDPR can help in customer experience, and provide a couple of case studies.
Remember, if you want to get started right now, download the complete GDPR: Impact to Your Data Management Landscape white paper today.
For a full description of the GDPR’s regulations, roles, and responsibilities, it is recommended that readers refer to the text of the GDPR (Regulation (EU) 2016/679), available from the Official Journal of the European Union, and refer to legal counsel for the interpretation of how the regulations apply to their organization. Further, in order to effectively achieve the functionality described in this blog series, it is critical to ensure that the database is implemented according to the specifications and instructions detailed in the MongoDB security documentation. Readers should consider engaging MongoDB Global Consulting Services to assist with implementation.