GIANT Stories at MongoDB

GDPR: Impact to Your Data Management Landscape: Part 4

Mat Keep
September 18, 2017
Business

Welcome to the final installment of our 4-part blog series.

  • In part 1, we provided a primer into the GDPR – covering its rationale, and key measures
  • In part 2, we explored what the GDPR means for your data platform
  • In part 3, we discussed how MongoDB’s products and services can support you in your path to compliance
  • Finally, in this part 4, we’re going to examine how the GDPR can help in customer experience, and provide a couple of case studies.

If you can’t wait for all 4 parts of the series, but would rather get started now, download the complete GDPR: Impact to Your Data Management Landscape whitepaper today.

Using the GDPR for Customer Experience Transformation

As discussed in parts 1 and 2 of this blog series, to comply with the GDPR, organizations will need to identify all personal data within their systems. Forward-looking companies can leverage the regulations for personal data discovery processes to transform interactions with their customers.

Marketing and sales groups have long seen the value in aggregating data from multiple, disconnected systems into a single, holistic, real-time representation of their customer. This single view can help in enhancing customer insight and intelligence – with the ability to better understand and predict customer preferences, behaviors, and needs.

However, for many organizations, delivering a single view of the customer to the business has been elusive. Technology has been one limitation – for example, the rigid, tabular data model imposed by traditional relational databases inhibits the schema flexibility necessary to accommodate the diverse customer data sets contained in multiple source systems. But limitations extend beyond just the technology to include the business processes needed to deliver and maintain a single view.

MongoDB has been used in many single view projects across enterprises of all sizes and industries. Through the best practices observed and institutionalized over the years, MongoDB has developed a repeatable, 10-step methodology to successfully delivering a single view.

Figure 1:10-step methodology to building a single customer view with MongoDB

You can learn more by downloading the MongoDB single view whitepaper, covering:

  • The 10-step methodology to delivering a single view
  • The required technology capabilities and tools to accelerate project delivery
  • Case studies from customers who have built transformational single view applications on MongoDB

Case Studies

MongoDB has been downloaded over 30 million times and counts 50% of the Fortune 100 as commercial customers of MongoDB’s products and services. Among the Fortune 500 and Global 500, MongoDB customers include:

  • 40 of the top financial services institutions
  • 15 of the top retailers
  • 15 of the top telcos
  • 15 of the top healthcare companies
  • 10 of the top media and entertainment companies

MongoDB is used by enterprises of all sizes, and from all industries to build modern applications, often as part of digital transformation initiatives in the cloud. An example of such a company is Estates Gazette, the UK’s leading commercial property data service.

Estates Gazette (EG)

The company’s business was built on print media, with the Estates Gazette journal serving as the authoritative source on commercial property across the UK for well over a century. Back in the 1990s, the company was quick to identify the disruptive potential of the Internet, embracing it as a new channel for information distribution. Pairing its rich catalog of property data and market intelligence with new data sources from mobile and location services – and the ability to run sophisticated analytics across all of it in the cloud – the company is now accelerating its move into enriched market insights, complemented with decision support systems.

To power its digital transformation, Estates Gazette migrated from legacy relational databases to MongoDB, running an event-driven architecture and microservices, deployed to the Amazon Web Services (AWS) cloud. The company is also using MongoDB Enterprise Advanced with the Encrypted storage engine to extend its security profile, and prepare for the EU GDPR.

You can learn more by reading the Estates Gazette case study.

Leading European Retailer

As part of its ongoing digital transformation that extends customer engagement beyond brick and mortar stores to mobile channels, the retailer with over 50,000 employees and €4.5bn in annual sales, was building a new mobile app offering opt-in marketing services to collect customer data, storing it in MongoDB.

As part of its GDPR readiness, the retailer employed MongoDB Global Consulting Services to advise on data protection best practices, taking advantage of the MongoDB Enterprise Advanced access controls, encryption, and auditing framework. By using MongoDB consultants in the design phase of the project, the retailer has been able to adopt a “security by design and by default” approach, while enhancing its security posture.

Wrapping Up our 4-Part Blog Series

That wraps up the final part of our 4-part blog series. If you want to read the entire series in one place, download the complete GDPR: Impact to Your Data Management Landscape whitepaper today.

It’s worth remembering that it takes much more than security controls of a database to achieve GDPR compliance. However, MongoDB is offering a holistic vision of how database customers can accelerate a path to meeting regulations scheduled for enforcement from May 2018.

Using the advanced security features available in MongoDB Enterprise Advanced and the MongoDB Atlas managed database service, organizations have extensive capabilities to implement the data discovery, defense, and detection requirements demanded by the GDPR. Methodologies used in successfully delivering customer single view projects can be used to support data discovery, and used to innovate in delivering a differentiated customer experience.

Disclaimer
For a full description of the GDPR’s regulations, roles, and responsibilities, it is recommended that readers refer to the text of the GDPR (Regulation (EU) 2016/679), available from the Official Journal of the European Union, and refer to legal counsel for the interpretation of how the regulations apply to their organization. Further, in order to effectively achieve the functionality described in this blog series, it is critical to ensure that the database is implemented according to the specifications and instructions detailed in the MongoDB security documentation. Readers should consider engaging MongoDB Global Consulting Services to assist with implementation.

GDPR: Impact to Your Data Management Landscape: Part 3

Mat Keep
September 11, 2017
Business

Welcome to part 3 of our 4-part blog series.

  • In part 1, we provided a primer into the GDPR – covering its rationale, and key measures
  • In part 2, we explored what the GDPR means for your data platform
  • In today’s part 3, we’ll discuss how MongoDB’s products and services can support you in your path to compliance
  • Finally, in part 4, we’ll examine how the GDPR can help in customer experience, and provide a couple of case studies. .

If you can’t wait for all 4 parts of the series, but would rather get started now, download the complete GDPR: Impact to Your Data Management Landscape white paper today.

How MongoDB Can Help Meet GDPR Requirements

While data protection regulations such as GDPR, HIPAA, PCI-DSS, and others stipulate requirements that are unique to specific regions, industries or applications, there are foundational requirements common across all of the directives, including:

  • Restricting data access, enforced via predefined privileges and roles
  • Measures to protect against the accidental or malicious disclosure, loss, destruction, or damage of personal data
  • The separation of duties when accessing and processing data
  • Recording user, administrative staff, and application activities with a database


Figure 1: MongoDB End to End Security Architecture

These requirements inform the security architecture of MongoDB, with best practices for the implementation of a secure, compliant data management platform.

Using the advanced security features available in MongoDB Enterprise Advanced and the MongoDB Atlas cloud database service, organizations have extensive capabilities to implement the data discovery, defense, and detection requirements demanded by the GDPR.

**Table 1**: Mapping GDPR requirements to MongoDB Enterprise Advanced capabilities

Discover

Identification of Personal Data

There are multiple ways to inspect database content. The most common method is to query the database and extract all records to identify the tables and rows (collections and documents, in MongoDB terminology) containing user data. However, this approach also requires significant manual analysis of the schema to track what data is stored, and where, while imposing processing overhead on the database itself.

MongoDB provides a much simpler approach with Compass, the GUI for MongoDB. Compass enables users to visually explore their data, providing a graphical view of their MongoDB schema by sampling a subset of documents from a collection, thereby minimizing database overhead and presenting results to the user almost instantly.

Schema visualization with MongoDB Compass enables the user to quickly explore their schema to understand the frequency, types, and ranges of fields in each data set. The user doesn’t need to be conversant with the MongoDB query language – powerful ad-hoc queries can be constructed through a point and click interface, opening up the discovery and data loss prevention process beyond developers and DBAs to Data Protection Officers and other business users.

Beyond Compass, the MongoDB query language and rich secondary indexes enable users to query and analyze the data in multiple ways. Data can be accessed by single keys, ranges, text search, graph, and geospatial queries through to complex aggregations, returning responses in milliseconds. Data can be dynamically enriched with elements such as user identity, location, and last access time to add context to Personally Identifiable Information (PII) , providing behavioral insights and actionable customer intelligence. Complex queries are executed natively in the database without having to use additional analytics frameworks or tools, and avoiding the latency that comes from ETL processes that are necessary to move data between operational and analytical systems in legacy enterprise architectures.

Figure 2: Data discovery with MongoDB Compass GUI-based schema exploration

Retention of Personal Data

Through the use of the special-purpose TTL (Time-To-Live) index, administrators can automate the expiration of EU citizen data from a database. By configuring the required retention period against a date field in the document (i.e. the date on which the user data was collected or last accessed), MongoDB will delete the document once the period has been reached, using an automated background process that runs against the database every 60 seconds.

Compared to implementing expiration code at the application level, which must then regularly scan the database to find records that need to be deleted, the MongoDB TTL index dramatically simplifies the enforcement of data expiration policies. It also imposes significantly lower database overhead.

Defend

Access Control

Access control to a database can be separated into two distinct stages:
1. Authentication, designed to confirm the identity of clients accessing the database.
2. Authorization, governing what that client is entitled to do once they have access to the database, such as reading data, writing data, performing administrative and maintenance activities, and more.

MongoDB Authentication

MongoDB provides multiple authentication methods, allowing the approach best suited to meet the requirements of different environments. Authentication can be managed from the database itself, or through integration with external authentication mechanisms.

MongoDB Atlas enforces in-database authentication via the SCRAM IETF RFC 5802 standard. As the MongoDB Atlas service runs on public cloud platforms, it also implements additional security controls to reduce the risk of unauthorized access. An Atlas cluster by default will disallow direct access from the internet. Each Atlas cluster is deployed within a virtual private environment (e.g., AWS or GCP Virtual Private Cloud, Azure Virtual Network), and that private environment is by default configured to allow no inbound access. Also IP whitelisting can be used to restrict network access to a database (i.e., application servers are prevented from accessing the database unless their IP address has been added to the whitelist for the appropriate MongoDB Atlas group) The Atlas AWS VPC peering option allows peering an organization’s Atlas network to its own AWS VPC network, thereby ensuring network traffic never traverses the public internet, and instead uses the internal private network.

MongoDB Enterprise Advanced also allows SCRAM authentication, with additional integration options for LDAP, Kerberos, or x.509 PKI certificates.

LDAP is widely used by many organizations to standardize and simplify the way large numbers of users are managed across internal systems and applications. In many cases, LDAP is also used as the centralized authority for user access control to ensure that internal security policies are compliant with corporate and regulatory guidelines. With LDAP integration, MongoDB Enterprise Advanced can both authenticate and authorize users directly against existing LDAP infrastructure to leverage centralised access control architectures.

MongoDB Enterprise Advanced also supports authentication using a Kerberos service. Through LDAP and Kerberos, MongoDB Enterprise Advanced provides support for authentication using Microsoft Active Directory. The Active Directory domain controller authenticates MongoDB users and servers running in a Windows network, again to leverage centralised access control.

With support for x.509 certificates MongoDB also can be integrated with Certificate Authorities (CA), supporting both user and inter-node cryptographic authentication, reducing risks found in passwords or keyfiles.

Review the Authentication section of the documentation to learn more about the different mechanisms available in MongoDB Enterprise Advanced.

MongoDB Authorization

Over ten predefined roles supporting common user and administrator database privileges provide Role Based Access Control (RBAC) capabilities. With MongoDB Enterprise Advanced, these can be further customised through User Defined Roles, enabling administrators to assign fine-grained privileges to clients, based on their respective data access and processing needs. To simplify account provisioning and maintenance, roles can be delegated across teams, ensuring the enforcement of consistent policies across specific data processing functions within the organization.

MongoDB Enterprise Advanced also supports authorization via LDAP, in addition to authentication discussed above. This enables existing user privileges stored in a LDAP server to be mapped to MongoDB roles, without recreating users in MongoDB itself. This integration strengthens and simplifies access control by enforcing centralised processes.

Review the Authorization section of the documentation to learn more about [role-based access control in MongoDB](http://docs.mongodb.org/master/core/authorization/).

Pseudonymisation & Encryption

As discussed in part 2, pseudonymisation and encryption of data is designed to prevent the identification of any specific individual in the event of data being accessed by an unauthorized party.

Pseudonymisation

MongoDB provides multiple levels of pseudonymisation. Through read-only views, MongoDB can automatically filter out specific fields, such as those containing PII of citizens when a database is queried. Rather than query collections directly, clients can be granted access only to specific, predefined views of the data. Permissions granted against the view are specified separately from permissions granted to the underlying collection, and so clients with different access privileges can be granted different views of the data.

Read-only views allow the inclusion or exclusion of fields, masking of field values, filtering, schema transformation, grouping, sorting, limiting, and joining of data across multiple collections. Read-only views are transparent to the application accessing the data, and do not modify the underlying raw data in any way.

MongoDB Enterprise Advanced can also be configured with log redaction to prevent potentially sensitive information, such as personal identifiers, from being written to the database’s diagnostic log. Developers and DBAs who may need to access the logs for database performance optimization or maintenance tasks still get visibility to metadata, such as error or operation codes, line numbers, and source file names, but are unable to see any personal data associated with database events.

Encryption

Encryption can protect data in transit and at rest, enabling only authorized access. Should unauthorized users gain access to a network, server, filesystem or database the data still can be protected with encryption keys.

Support for Transport Layer Security (TLS) allows clients to connect to MongoDB over an encrypted network channel, protecting data in transit. In addition, MongoDB encrypts data at rest in persistent storage and in backups.

Using the MongoDB Atlas managed database service, TLS is the default and cannot be disabled. Traffic from clients to Atlas, and between Atlas cluster nodes, is authenticated and encrypted. Encryption-at-rest is an available, no-cost option for customers using the public cloud providers’ disk and volume encryption services.

Figure 3:End to End Encryption – Data In-Flight and Data At-Rest

MongoDB Enterprise Advanced also offers the Encrypted Storage Engine, making the protection of data at-rest an integral feature of the database. By natively encrypting database files on disk, administrators reduce both the management and performance overhead of external encryption options, while providing an additional level of defense. Only those staff with the appropriate database credentials can access encrypted personal data. Access to the database file on the server would not expose any stored personal information.

The storage engine encrypts each database with a separate key. MongoDB recommends encryption keys be rotated and replaced at regular intervals, and by performing rolling restarts of the replica set, keys can be rotated without database downtime. Database files themselves do not need to be re-encrypted when using a Key Management Interoperability Protocol (KMIP) service, thereby also avoiding the performance overhead incurred with key rotation.

Refer to the documentation to learn more about encryption in MongoDB. Resilience and Disaster Recovery To protect service availability and recover from events that cause data corruption or loss, MongoDB offers fault tolerance to systems failures, along with backup and recovery tools for disaster recovery.

Resilience

Using native replication, MongoDB maintains multiple copies of data in what are called replica sets. A replica set is a fully self-healing cluster distributed across multiple nodes to eliminate single points of failure. In the event a node fails, replica failover is fully automated, eliminating the need for administrators to intervene manually to restore database availability.

The number of replicas in a MongoDB replica set is configurable: a larger number of replicas will provide increased data availability and protection against database downtime (e.g., in case of multiple machine failures, rack failures, data center failures, or network partitions). Replica sets also provide operational flexibility by providing a way to upgrade hardware and software without requiring the database to be taken offline. Replica set members can be deployed both within and across physical data centers and cloud regions, providing resilience to regional failures.

Disaster Recovery

Data can be compromised by a number of unforeseen events: failure of the database or its underlying infrastructure, user error, malicious activity, or application bugs. With a backup and recovery strategy in place, administrators can restore business operations by quickly recovering their data, enabling the organization to meet regulatory and compliance obligations.

The operational tooling provided as part of MongoDB Enterprise Advanced and the MongoDB Atlas managed database service can continuously maintain database backups for you. If MongoDB experiences a failure, the most recent backup is only moments behind the operational system, minimizing exposure to data loss. The tooling offers point-in-time recovery of replica sets and cluster-wide snapshots of sharded clusters. These operations can be performed without any interruption to database service. Administrators can restore the database to precisely the moment needed, quickly and safely. Automation-driven restores allow a fully configured cluster to be re-deployed directly from the database snapshots in a just few clicks, speeding time to service recovery.

You can learn more about backup and restore in MongoDB Enterprise Advanced from the Ops Manager documentation, and from the documentation for MongoDB Atlas.

Data Sovereignty: Data Transfers Outside of the EU

To support data sovereignty requirements, MongoDB zones allow precise control over where personal data is physically stored in a cluster. Zones can be configured to automatically “shard” (partition) the data based on the user’s location – enabling administrators to isolate EU citizen data to physical facilities located only in those regions recognized as complying with the GDPR. If EU policies towards storing data in specific regions change, updating the shard key range can enable the database automatically to move personal data to alternative regions.

Beyond geo-specific applications, zones can accommodate a range of deployment scenarios – for example supporting tiered storage deployment patterns for data lifecycle management, or segmenting data by application features or customers.

You can learn more about MongoDB zone sharding from the documentation .

Note that the MongoDB Atlas service can be provisioned into any one of a number of supported cloud provider regions, but must run within that region. A single cluster cannot currently span multiple regions.

Detect

Monitoring

Proactive monitoring of all components within an application platform is always a best practice. System performance and availability depend on the timely detection and resolution of potential issues before they present problems to users. Sudden and unexpected peaks in memory and CPU utilization can, among other factors, be indicative of an attack, which can be mitigated if administrators are alerted in real time.

The operational tooling provided with MongoDB Enterprise Advanced and the MongoDB Atlas managed database service provide deep operational visibility into database operations. Featuring charts, custom dashboards, and automated alerting, MongoDB’s operational tooling tracks 100+ key database and systems health metrics including operations counters, memory, CPU, and storage consumption, replication and node status, open connections, queues, and many more. The metrics are securely reported to a management UI where they are processed, aggregated, alerted, and visualized in a browser, letting administrators easily track the health of MongoDB in real time. Metrics can also be pushed to Application Performance Management platforms such as AppDynamics and New Relic, supporting centralised visibility into the global IT estate.

Figure 4: Ops Manager Offers Charts, Custom Dashboards & Automated Alerting

Custom alerts can be generated when key metrics are out of range. These alerts can be sent via SMS and email, or integrated into existing incident management and collaboration systems such as PagerDuty, Slack, HipChat, and others to proactively warn of potential issues and help prevent outages or breaches.

The operational tooling also enables administrators to roll out upgrades and patches to the database without application downtime. Using the MongoDB Atlas database service, patches are automatically applied, removing the overhead of manual operator intervention.

Auditing

By maintaining audit trails, changes to personal data and database configuration can be captured for each client accessing the database, providing a log for compliance and forensic analysis by data controllers and supervisory authorities.

The MongoDB Enterprise Advanced auditing framework logs all access and actions executed against the database, including:

  • Administrative actions such as adding, modifying, and removing database users, schema operations, and backups.
  • Authentication and authorization activities, including failed attempts at accessing personal data.
  • Read and write operations to the database.

Administrators can construct and filter audit trails for any operation against MongoDB Enterprise Advanced. They can capture all activities, or just a subset of actions, based on the requirements stipulated by the data controller and auditors. For example, it is possible to log and audit the identities of users who accessed specific documents, and any changes they made to the database during their session. Learn more from the MongoDB Enterprise Advanced auditing documentation.

The MongoDB Atlas managed database service provides an audit log of administrative actions, such as the deployment and scaling of clusters, and addition or removal of users from an Atlas group. Database log access is also provided that can be used by controllers to track user connections to the database.

Services to Help Your Teams Create a Secure Database Environment

The GDPR text explicitly states the requirement for training in the text “Binding Corporate Rules”, Article 47 (clause 2n)

“the appropriate data protection training to personnel having permanent or regular access to personal data.“

MongoDB provides extensive training and consulting services to help customers apply best security practices:

  • The MongoDB Security course is a no-cost, 3-week online training program delivered by MongoDB University.
  • MongoDB University also offers a range of both public and private training for developers and operations teams, covering best practices in using and administering MongoDB.
  • MongoDB Global Consulting Services offer a range of packages covering Health Checks, Production Readiness Assessments, and access to Dedicated Consulting Engineers. The MongoDB consulting engineers work directly with your teams to guide development and operations, ensuring skills transfer to your staff.

Wrapping Up Part 3

That wraps up the third part of our 4-part blog series. In Part 4, we’ll examine how the GDPR can help in customer experience, and provide a couple of case studies.

Remember, if you want to get started right now, download the complete GDPR: Impact to Your Data Management Landscape white paper today.

Disclaimer
For a full description of the GDPR’s regulations, roles, and responsibilities, it is recommended that readers refer to the text of the GDPR (Regulation (EU) 2016/679), available from theOfficial Journal of the European Union, and refer to legal counsel for the interpretation of how the regulations apply to their organization. Further, in order to effectively achieve the functionality described in this blog series, it is critical to ensure that the database is implemented according to the specifications and instructions detailed in the MongoDB security documentation". Readers should consider engaging MongoDB Global Consulting Services to assist with implementation.

GDPR: Impact to Your Data Management Landscape: Part 2

Mat Keep
September 05, 2017
Business

Welcome to part 2 of our 4-part blog series.

  • In part 1, we provided a primer into the GDPR – covering its rationale, and key measures
  • In todays part 2, we’ll explore what the GDPR means for your data platform
  • In part 3, we’ll discuss how MongoDB’s products and services can support you in your path to compliance
  • Finally, in part 4, we’ll examine how the GDPR can help in customer experience, and provide a couple of case studies.

If you can’t wait for all 4 parts of the series, but would rather get started now, download the complete GDPR: Impact to Your Data Management Landscape white paper today.

Mapping GDPR to Required Database Capabilities

Like other regulations designed to enforce data security and privacy standards (e.g., HIPAA, PCI DSS, SOX, FISMA, FERPA), GDPR compliance can be achieved only by applying a combination of controls that we can summarize as People, Processes, and Products:
  • “People” defines specific roles, responsibilities, and accountability.
  • “Processes” defines operating principles and business practices.
  • “Products” defines technologies used for data storage and processing.

As with any data security regulation, enabling controls in a database storing personal data is just one step towards compliance – people and processes also are essential. There are, however, specific requirements stated in the GDPR text that define a set of controls organizations need to implement across their data management landscape. We can group these requirements into three areas:

  • Discover: scope data subjects to the regulation.
  • Defend: implement measures to protect discovered data.
  • Detect: identify a breach against that data, and remediate security and process gaps.

The following section of the post examines GDPR requirements, and maps them back to the required database capabilities. Please note that the list below is illustrative only, and is not designed to be exhaustive.

Discover

Before implementing security controls, an organization first needs to identify personal data stored in its databases, and for how long the organization is permitted to retain that data. They also need to assess the potential impact to the individual, should the personal data be disclosed to an unauthorized party.

Identification of Impact to Personal Data

The GDPR requires organizations to undertake a Data Protection Impact Assessment, documented in Article 35 (clause 1) of the GDPR text, stating:

“Where a type of processing in particular using new technologies, and taking into account the nature, scope, context and purposes of the processing, is likely to result in a high risk to the rights and freedoms of natural persons, the controller shall, prior to the processing, carry out an assessment of the impact of the envisaged processing operations on the protection of personal data.”

It is therefore important to have access to tools that enable the data controller to quickly and conveniently review their database content, and as part of an ongoing discovery process, to inspect what additional data will be captured as new services are under development.

Retention of Personal Data

As noted in “Information to be Provided”, Article 13 (clause 2a), the GDPR text specifies that at the time data is collected from an individual, the organization must state:

“the period for which the personal data will be stored, or if that is not possible, the criteria used to determine that period”

Therefore, a required capability that the organization will need to implement is the ability to identify personal data, and securely erase it from the database once the expiration period has been reached, or an individual specifically requests erasure. As a result, storage, including backups, should have the ability to provably erase data as requested by owner.

Defend

Once the organization has conducted its Discover phase, with an Impact Assessment and expiration policies defined, they need to implement the controls that will protect citizen data.

General Security Requirements of the GDPR

The “Security of Processing”, Article 32 (clause 1) provides an overview of security controls an organization needs to enforce:

“….the controller and the processor shall implement appropriate technical and organisational measures to ensure a level of security appropriate to the risk, including inter alia as appropriate:

(a) the pseudonymisation and encryption of personal data; (b) the ability to ensure the ongoing confidentiality, integrity, availability and resilience of processing systems and services; (c) the ability to restore the availability and access to personal data in a timely manner in the event of a physical or technical incident; (d) a process for regularly testing, assessing and evaluating the effectiveness of technical and organisational measures for ensuring the security of the processing.”

Each of the bulleted clauses is further expanded upon within the GDPR text, as follows.

Access Control

The GDPR emphasizes the importance of ensuring that only authorized users can access personal data. As stated in the text “Data Protection by Design and by Default”, Article 25 (clause 2):

“The controller shall implement appropriate technical and organisational measures for ensuring that, by default, only personal data which are necessary for each specific purpose of the processing are processed”

This requirement is further reinforced in Article 29, “Processing Under the Authority of the Controller or Processor”, stating

“The processor and any person acting under the authority of the controller or of the processor, who has access to personal data, shall not process those data except on instructions from the controller….”

Within the database, it should be possible to enforce authentication controls so that only clients (e.g., users, applications, administrators) authorized by the data processor can access the data. The database should also allow data controllers to define the specific roles, responsibilities, and duties each client can perform against the data. For example, some clients may be permitted to read all of the source data collected on a data subject, while others may only have permissions to access aggregated data that contains no reference back to personal identifiers. This approach permits a fine-grained segregation of duties and privileges for each data processor.

Pseudonymisation & Encryption

In the event of a breach, the pseudonymisation and encryption of data is designed to prevent the identification of any specific individual from compromised data. In the definitions section of the GDPR text, pseudonymisation means:

“….the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information”

Clause 28 of the general regulations states:

“The application of pseudonymisation to personal data can reduce the risks to the data subjects concerned and help controllers and processors to meet their data-protection obligations.”

One of the most effective and efficient means of pseudonymising data is based on the access control privileges defined in the previous step. The database redacts personal identifiers by filtering query results returned to applications.

Encryption is specifically referenced in Article 32 (clause 1) referenced above. The advantages of encryption are further expanded in the text for “Communication of a Personal Data Breach to the Data Subject”, Article 34 (clause 3a), stating communication to the data subject is not required if:

“the controller has implemented appropriate technical and organisational protection measures, and those measures were applied to the personal data affected by the personal data breach, in particular those that render the personal data unintelligible to any person who is not authorised to access it, such as encryption;”

The database should provide a means to encrypt both data “in-transit” using network connections, and data “at-rest” using storage and backups.

Resilience and Disaster Recovery

As stated in bullets B and C in the “The Security of Processing”, Article 32 cited above, systems and service availability, along with a means to restore data in a timely fashion, are both core operational requirements of the GDPR.

As a result, the database needs to offer fault tolerance to systems failures, along with backup and recovery mechanisms to enable disaster recovery.

Data Sovereignty: Data Transfers Outside of the EU

Chapter 5 of the GDPR is dedicated to how the transfer of personal data outside of the EU should be handled – defining when such transfers are permissible and when they are not. Key to understanding data transfer is that EU citizen rights under the GDPR accompany the data to wherever it is moved globally, where the same safeguards must be applied. To summarize the chapter, Article 45 (clause 1) states:

“A transfer of personal data to a third country or an international organisation may take place where the Commission has decided that the third country, a territory or one or more specified sectors within that third country, or the international organisation in question ensures an adequate level of protection.”

To support globally distributed applications, organizations are increasingly distributing data to data centers and cloud facilities located in multiple countries across the globe. In context of the GDPR, it should be possible for the database to enforce data sovereignty policies by only distributing and storing EU citizen data to regions recognized as complying with the regulation.

Detect

In the event of a data breach, the organization must be able, in timely fashion, to detect and report on the issue, and also generate a record of what activities had been performed against the data.

Monitoring and Reporting

Monitoring is always critical to identifying potential exploits. The closer to real time, the better chance of limiting the impact of data breaches. For example, sudden peaks in database resource consumption can indicate an attack in progress at the very moment it happens.

In the GDPR text “Notification of a Personal Data Breach to the Supervisory Authority”, Article 33 (clause 1), it is stated:

“In the case of a personal data breach, the controller shall without undue delay and, where feasible, not later than 72 hours after having become aware of it, notify the personal data breach to the supervisory authority….”

As a result, the database should offer management tools that enable constant monitoring of database behavior to proactively mitigate threats, and that enable the organization to report on any breaches within the specified timeframes.

Auditing

“Data Protection by Design and by Default”, Article 25 (clause 2) emphasizes the requirement to maintain a log of activities performed against the data:

“….Each controller and, where applicable, the controller's representative, shall maintain a record of processing activities under its responsibility”

“Processor”, Article 28 (clause 3H) further expands on the requirement for auditing, stating that the data processor:

“makes available to the controller all information necessary to demonstrate compliance with the obligations laid down in this Article and allow for and contribute to audits, including inspections, conducted by the controller or another auditor mandated by the controller.”

The database needs to offer a mechanism to record database activity, and present that activity for forensic analysis when requested by the controller.

Wrapping Up Part 2

That wraps up the second part of our 4-part blog series. In Part 3, we’ll discuss how MongoDB’s products and services can help you meet the requirements we’ve discussed today

Remember, if you want to get started right now, download the complete GDPR: Impact to Your Data Management Landscape white paper today.

Disclaimer
For a full description of the GDPR’s regulations, roles, and responsibilities, it is recommended that readers refer to the text of the GDPR (Regulation (EU) 2016/679), available from the Official Journal of the European Union, and refer to legal counsel for the interpretation of how the regulations apply to their organization. Further, in order to effectively achieve the functionality described in this blog series, it is critical to ensure that the database is implemented according to the specifications and instructions detailed in the MongoDB security documentation. Readers should consider engaging MongoDB Global Consulting Services to assist with implementation.

GDPR: Impact to Your Data Management Landscape: Part 1

Mat Keep
August 29, 2017
Business

The timeline for compliance with the European Union’s General Data Protection Regulation (GDPR) is fast approaching. From May 25th 2018, any organization failing to satisfy the new regulation faces fines of up to 4% of global revenues, or €20m – whichever is greater – as well as the potential suspension of any further data processing activities. Irrespective of whether you have a physical presence in the EU or not, if you are handling EU citizen data in any way, you are subject to the GDPR.

That said, the regulation shouldn’t be viewed as some new burdensome red-tape imposed by faceless bureaucrats. Rather, for more progressive organizations, it presents an opportunity to transform how they engage with their customers in the digital economy.

In this 4-part blog series, we’re going to dive deeper into the regulation, and it what it means to you:

  • In today’s part 1, we’ll provide a primer into the GDPR – covering its rationale, and key measures
  • In part 2, we’ll explore what the GDPR means for your data platform
  • In part 3, we’ll discuss how MongoDB’s products and services can support you in your path to compliance
  • Finally, in part 4, we’ll examine how the GDPR can help in customer experience, and provide a couple of case studies.

If you can’t wait for all 4 parts of the series, but would rather get started now, download the complete GDPR: Impact to Your Data Management Landscape whitepaper today.

GDPR Rationale

Cyber-crime is forecast to cost the global economy $6 trillion by 2021, up from $3 trillion in 2016. Described by some as the “greatest threat to every company in the world”, public concern for the safety of data is growing – not just in how criminals might use stolen data to commit fraud, but also in how personal data is used by the organizations we engage with. Many people are asking whether data provided in exchange for goods, services, and employment could be used to:

  • Damage our reputations?
  • Deny us access to the healthcare or financial services we might need?
  • Discriminate against us based on our political views, religion, associations, or ethnicity?
  • Reduce our autonomy, freedom, and individuality?

The [European Union (EU) General Data Protection Regulation (GDPR) 2016/679]((http://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679&from=EN) is designed to confront these concerns. Protection and privacy of individuals – “data subjects” in GDPR terminology – becomes not just a legal obligation placed on organizations collecting and processing our data, but also entrenches data privacy as a fundamental human right of all EU citizens. The GDPR was introduced May 24, 2016, and will be enforced from May 25, 2018.

A range of requirements and controls are defined by the GDPR to govern how organizations collect, store, process, retain, and share the personal data of EU citizens. However, Gartner predicts that more than 50% of companies affected by the GDPR will not be in full compliance with its requirements by the end of 2018 – nine months after the regulation comes into force.

The existing EU data protection legislation (Data Protection Directive 95/46/EC) was introduced back in 1995, but was increasingly regarded as insufficient, both for today’s privacy demands, and those envisaged in the future:

  • Implementation varied across each member state, creating complexity, uncertainty, and cost. Inconsistencies affected both user trust in an emerging digital economy and EU competitiveness in the global market.
  • Technology enhancements over the past 20+ years now allow both private enterprises and public authorities to collect and make use of personal data on an unprecedented scale in order to pursue their activities. The emergence of social networking, cloud computing, eCommerce, web services, mobile devices and apps, Internet of Things, machine learning, and many more render the existing regulation inadequate.

The reform introduced by the GDPR is designed to provide EU citizens with more control over their own personal data. In this context, the scope of personal data has been expanded – it includes anything that can uniquely identify an individual, such as a name, an identification number, location data, an online identifier, or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural, or social identity of that individual.

Key Measures of the GDPR

In EU research, nine out of ten Europeans had expressed concern about mobile apps collecting personal data without their consent, and seven out of ten worried about the potential use that companies may make of the data that they disclosed. The GDPR attempts to address these concerns through a range of new measures:

  • Individuals must provide explicit consent to data collection – “consent by default” is no longer valid. The organization seeking consent must also provide clear information on how that data will be used, for how long it will be retained, and how it will be shared with third parties. Individuals can retract consent at any time, without prejudice. Additional permissions must be requested from the individual if the data is to be used for processing purposes beyond the original consent.
  • A "right to be forgotten", also known as “right to erasure”, requires deletion of data when owners ask for it to no longer be retained, and there is no legitimate reason for an organization to refuse the request.
  • Organizations must provide easier access to an individual’s data, enabling them to review what data is stored about them and how it is processed, who it is shared with, along with the ability to migrate that data between service providers without restriction.
  • A right to review is required for how automated decisions computed against personal data have been made, for example, by machine learning algorithms declining transactions based on risk scores.
  • Disclosure within 72 hours must be made to a member state’s “supervisory body” (a member state’s independent public authority overseeing GDPR implementation) when personal data has been breached, enabling individuals to be informed and take appropriate remedial action.
  • Data protection has to be by design and by default, requiring data protection controls to be built into products and services from the earliest stage of development, and the adoption of privacy-friendly default settings in all applications collecting personal data.
  • Punitive financial recourse (e.g., 4% of global revenue or €20m) will be made against any organization proven not to comply with the regulations.

The new regulations seek to provide clarity and consistency in how privacy rules are applied, not just across the EU, but also globally to every organization processing citizen data as part of offering products and services in the EU.

The GDPR introduces specific terminology to define roles and responsibilities within organizations, including:

  • Data Protection Officer (DPO), an individual employed by the data controller or processor, with responsibility for advising on GDPR regulation, reporting to the highest management level. The DPO is ultimately answerable to the local supervisory authority.
  • Data controller, typically the organization with whom the data subject (the individual) is sharing the data.
  • Data processor, an organization and/or individual working on behalf of the controller, e.g., a direct employee such as a business analyst or a developer, or an external service provider, such as a credit rating agency or a payroll processor. A data processor is any entity or individual with access to personal data.

GDPR’s Definition of a Data Breach

It is very important to understand what a data breach means in context of this new regulation. The GDPR applies a much broader definition than only loss of confidentiality or unauthorized processing of personal data, demonstrating that data protection methods extend beyond narrow concepts of access. It also encompasses availability and integrity. The GDPR text states:

“‘personal data breach’ means a breach of security leading to the accidental or unlawful destruction, loss, alteration, unauthorised disclosure of, or access to, personal data transmitted, stored or otherwise processed”

Wrapping Up Part 1

That wraps up the first part of our 4-part blog series. In Part 2, we’ll examine specific GDPR requirements, and map them back to a set of required database capabilities.

Remember, if you want to get started right now, download the complete GDPR: Impact to Your Data Management Landscape whitepaper today.

Disclaimer

For a full description of the GDPR’s regulations, roles, and responsibilities, it is recommended that readers refer to the text of the GDPR (Regulation (EU) 2016/679), available from the Official Journal of the European Union, and refer to legal counsel for the interpretation of how the regulations apply to their organization. Further, in order to effectively achieve the functionality described in this blog series, it is critical to ensure that the database is implemented according to the specifications and instructions detailed in the MongoDB security documentation. Readers should consider engaging MongoDB Global Consulting Services to assist with implementation.

Modernizing and Protecting Data Center Operations with MongoDB and Dell EMC

As part of our ongoing series highlighting our partner ecosystem, we recently sat down with Dell EMC Global Alliances Director, Tarik Dwiek and Director of Product Management, Philip Fote, to better understand how Dell EMC and MongoDB partner to help modernize and protect data center operations.

What do you want customers to know about the MongoDB and Dell EMC relationship?
Tarik: We have been partnering for over a year on Dell EMC Flash and software-defined platforms, and the traction has been amazing. To fully realize the potential of MongoDB, customers need to modernize their infrastructure and transform their data center operations. At Dell EMC, our strategy is to help customers achieve this modernization by taking advantage of 4 key pillars: flash, software-defined, scale-out, and cloud enabled solutions. In addition, we are working on a data protection strategy for enterprise-grade backup and restore of MongoDB.

Can you further explain how this strategy relates directly to MongoDB?
Tarik: First off, MongoDB unlocks the ability for unparalleled performance at the database layer. This is where Flash is essential, meeting these performance requirements with compelling economics. Second, scale-out architectures, like MongoDB, have become a requirement because customers are generating orders of magnitude more data. Third, many organizations are implementing a software-defined data center. This model automates the deployment and configuration of IT services, resulting in agility and flexibility for managing data services. Finally, we want to ensure that the on-prem data center can leverage public cloud economics non-disruptively.

Tell us more about Dell EMC Data Protection solutions.
Philip: At Dell EMC, we believe data needs to be protected wherever it lives and no matter what happens. With this in mind, we start with the reality that data protection cannot be one size fits all in terms of service levels. Protection and availability should be based on data value and service levels that align to business objectives. Dell EMC looks at data protection as a continuum that spans over many protection tiers, including availability, replication, backup, snapshots, and archive; we offer products and solutions that span this continuum. With this, customers can tailor their data protection solution to best serve their specific needs.

What is Data Domain?
Philip: Dell EMC Data Domain systems deliver industry leading protection storage. Data Domain can reduce the amount of disk storage needed to retain and protect data by ratios of 10-30x and greater. It can scale up to 150 PB of logical capacity managed by a single system and with throughput up to 68 TB/hour. Data Domain systems make it possible to complete more backups in less time and provide faster, more reliable restores. The Data Domain Operating System (DD OS) is the intelligence behind Data Domain systems that makes them the industry’s most reliable and cloud-enabled protection storage.

What is DD Boost?
Philip: DD Boost provides advanced integration between Data Domain systems and leading backup and enterprise applications. DD Boost distributes parts of the deduplication process to the backup server or application client to speed backups by up to 50 percent and reduce bandwidth requirements by up to 99 percent.

What is DD Boost file system plug-in?
Philip: DD Boost is now immediately available for new workloads that were previously unavailable by using a standard file system interface. BoostFS can be deployed in minutes, reducing backup windows and storage capacity.

Why did you choose to certify MongoDB with BoostFS?
Philip: Dell EMC is committed to providing customers a holistic data protection strategy that evolves with changes in the market. The adoption of NoSQL open source databases is one of those changes, and MongoDB is a market leader. This new partnership with the Data Domain ecosystem will better allow our customers to add MongoDB workloads to their existing infrastructure. BoostFS provides all the benefits and efficiencies of DD Boost, and does so in a simple, cost effective manner. With Dell EMC and MongoDB, customers are now given a valuable, synergistic solution built from two industry leaders.

What MongoDB configurations are supported with BoostFS?
Database: MongoDB v2.6, 3.0, 3.2, and 3.4 (future)
Storage Engines: mmapv1 and wired tiger
Backup Tools: Ops Manager 2.0.7, mongodump


Data Domain: All Platforms and DDVE
DD OS: v6.0
BoostFS: v1.0

For more information or to ask questions about BoostFS with MongoDB, please visit the Data Domain Community web site.

Where do you see this relationship going?
Philip: As the Product Manager for DD Boost and BoostFS, part of my responsibilities include running the partner ecosystem for DD Boost, so I have a lot experience in dealing with partners. When working in that capacity, it’s easy to separate the good from the bad. Working with MongoDB has been great from the start – they have been responsive, flexible, and proactive in solving problems. Both firms are excited about the solution being offered today, and discussions have already started on extending this solution to cloud use cases.

What is the main use case for MongoDB with BoostFS?
Philip: One of the main use cases for BoostFS is to provide an enterprise backup and recovery solution with the option to replicate to a remote site. This secondary site can be used for disaster recovery or long term retention. The BoostFS plug-in resides on the MongoDB Ops Manager server as a Linux file system mount point, and the DD Boost protocols transports the data written to the file system, by Ops Manager, to/from the Data Domain. Then backups are replicated using MTree replication to a remote Data Domain system.

MongoDB and Boost

What are the benefits you’ll get with BoostFS for MongoDB as opposed to Network File System (NFS)?
Philip: BoostFS offers advanced features while retaining the user experience you get with NFS, including load balancing and failover plus security. The chart below shows the benefits of BoostFS over NFS. Details on these features can be found on DellEMC.com or at the Data Domain User Community site.

BoostFS for MongoDB

What exciting things can we look forward to next from MongoDB and Dell EMC?
Tarik: We have invested heavily in hyper-converged infrastructure. More and more customers are seeing the benefits in shifting their focus from maintaining infrastructure to innovating their application. We see tremendous potential in validating and eventually embedding MongoDB into our converged offerings.

Thank you for speaking with us Tarik and Philip. If you’d like to learn more:

Dell EMC and MongoDB Solutions Brief



Transforming Customer Experiences with Sitecore and MongoDB Enterprise

Modern customers live across multiple channels, creating data with every tweet, comment, swipe, and click. They expect deep personalization based on their interactions with a company's brand and won’t settle for anything less than instant gratification. Customer data is a company’s lifeblood. But it can’t help a company if it’s strewn across an organization, locked away in siloed systems. Designed to alleviate IT organizations’ data burden and empower marketers, Sitecore Experience Database (xDB) is a Big Marketing Data repository that collects all customer interactions, connecting them to create a comprehensive, unified view of the individual customer.

Sitecore is a leader in this space and their product makes data available to marketers in real-time, for automated interactions across all channels. xDB is a critical component of Sitecore Experience Platform™, a single platform that allows you to create, deliver, measure, and optimize experiences for your prospects and customers. xDB is powered by MongoDB, and collects and connects all of a customer’s interactions with a company's brand - including those on other customer-facing platforms such as ERP, CRM, customer service, and non-Sitecore websites, creating comprehensive, unified views of each customer. Those views are available to your marketers in real-time to help you create catered customer experiences across all your channels. Sitecore is one of our best and most strategic partners and we are proud to state the relationship is stronger than ever.

Recently, Sitecore launched Sitecore® Experience Platform 8.2, which includes new features such as advanced publishing, e-commerce enhancements, and data-at-rest encryption. Encryption is critical component of any application and as a best practice Sitecore recommends that all xDB deployments encrypt-data-at-rest. MongoDB provides a comprehensive native at-rest database encryption through the WiredTiger storage engine. There is no need to use 3rd party applications that may only encrypt files at the application, file system, or disk level. MongoDB’s WiredTiger storage engine is fully integrated and allows enterprises to safeguard their xDB deployments by encrypting and securing customer data-at-rest.

MongoDB is the leading non-relational database on the market and an integral part of xDB. It is the ideal database for collecting varied interactions and connecting them to create a “single view” of your customers.

Store and analyze anything

Instead of using rows and columns, MongoDB stores data using a flexible document data model, allowing you to ingest, store, and analyze any customer interaction or data type.

Scale without limit

MongoDB enables you to handle up to hundreds of billions of visits or interactions per year. When your deployment hits the scalability limits of a single server (e.g., the CPU, memory, or storage is fully consumed), MongoDB uses a process called sharding to partition and distribute data across multiple servers. Automatic load balancing ensures that performance is consistent as your data grows. The database runs on commodity hardware so you can scale on-demand while keeping costs low.

Minimize downtime

MongoDB supports native data replication with automated failover in the event of an outage. High availability and data redundancy is built into each MongoDB replica set, which serves as the basis for all production deployments of the database. The database’s self-healing architecture ensures that your team will always have access to the tools they need to deliver the best customer experience.

Deploy anywhere

MongoDB can be deployed in your data center or in the cloud. MongoDB management tools providing monitoring, backup and operational automation are available for each type of deployment.

In xDB, the collection database acts as a central repository for storing contact, device, interaction, history and automation data. An optimal collection database configuration helps organizations increase the availability, scalability, and performance of their Sitecore deployments.

With MongoDB, companies can ingest, store, and analyze varied data from billions of visits with ease. MongoDB scales horizontally across commodity servers, allowing customers to cost-effectively grow their deployments to handle increasing data volumes or throughput.

We offer a number of products and customized services to ensure your success with SiteCore xDB. If you’re interested in learning more about how we can help, [click here]((https://www.mongodb.com/lp/contact/consulting/mongodb-deployment-sitecore?jmp=blog) and a member of our team will be in touch with you shortly.

MongoDB Enterprise Advanced is the best way to run MongoDB in your data center. It includes the advanced features and the round-the-clock, enterprise-grade support you need to take your deployment into production with the utmost confidence. Features include:

  • Advanced Security
  • Commercial License
  • Management Platform
  • Enterprise Software Integration
  • Platform Certification
  • On-Demand Training
  • Enterprise-grade support, available 24 x 365

The MongoDB Deployment for Sitecore consulting engagement helps you create a well-designed plan to deploy a highly available and scalable Sitecore xDB. Our consulting engineer will collaborate with your teams to configure MongoDB’s replication and sharding features to satisfy your organization’s requirements for Sitecore xDB availability and performance.

Click here to learn more


About the Author - Alan Chhabra

Alan is responsible for Worldwide Partners at MongoDB which include System Integrators, ISVs, and Technology Alliances. Before joining the company, Alan was responsible for WW Cloud & Data Center Automation Sales at BMC Software managing a $200M annual revenue business unit that touched over 1000 customers. Alan has also held senior sales, services, engineering & IT positions at Egenera (a cloud pioneer), Ernst & Young consulting, and the Charles Stark Draper Laboratory. Alan is a graduate of the Massachusetts Institute of Technology, where he earned his B.S. in mechanical engineering and his Masters in aerospace engineering.

Leaf in the Wild: KPMG France Enters the Cloud Era with New MongoDB Data Lake

Love it or loathe it, the term “big data” continues to gain awareness and adoption in every industry. No longer just the preserve of internet companies, traditional businesses are innovating with “big data” applications in ways that were unimaginable just a few years ago.

A great example of this is KPMG France’s deployment of a MongoDB-based data lake to support its accounting suite named Loop, and the release of its industry-first financial benchmarking service – enabling KPMG France customers to unlock new levels of insight into how each of their businesses are really performing. In the true spirit of big data, this application would have truly overwhelmed the capabilities of traditional data management technologies. I spoke with Christian Taltas, Managing Director of KPMG France Technologies Services, to learn more.

Can you start by telling us about KPMG France?
KPMG is one of the world’s largest professional services firms operating as independent businesses in 155 countries, with 174,000 staff. KPMG provides audit, tax and advisory services used by corporations, governments and not-for-profit organizations.

KPMG France provides accounting services to 65,000 customers. I am the managing director of KPMG Technologies Services (KTS), a software company subsidiary of KPMG France. KTS developed Loop, a complete collaborative accounting solution which is used by KPMG France’s Certified Public Accountants (CPAs) and their clients.

Please describe how you use MongoDB.
MongoDB is the database powering the Loop accounting suite, used by KPMG’s 4,800 CPAs. The suite is also currently used in collaboration with around 2,000 of KPMG’s customers. We are expecting more than 20,000 customers to adopt Loop’s collaborative accounting within the next 18 months.

What services does MongoDB provide for the accounting suite?
It serves multiple functions for the suite:

Data Lake: All raw accounting data from our customers’ business systems, such as sales data, invoices, bank statements, cash transactions, expenses, payroll and so on, is ingested from Microsoft SQL Server into MongoDB. This data is then accessible to our CPAs to generate the customer’s KPIs. A unique capability we have developed for our customers is financial benchmarking. We can use the data in the MongoDB data lake to allow our customers to benchmark their financial performance against competitors operating in the same industries within a specified geographic region. They can compare salary levels, expenses, margin, marketing costs – in fact almost any financial metric – to help determine their overall market competitiveness against other companies operating in the same industries, regions and markets. The MongoDB data lake enables us to manage large volumes of structured, semi-structured, and unstructured data, against which we can run both ad-hoc and predefined queries supporting advanced analytics and business intelligence dashboards. We are continuously loading new data to the data lake, while simultaneously supporting thousands of concurrent users.

Metadata Management:- Another unique feature of our accounting suite is the ability to customize reporting for each customer, based on specific criteria they want to track. For example, a restaurant chain will be interested in different metrics than a construction company. We enable this customization by creating a unique schema for each customer which is inherited from a standard business application schema, and then written to MongoDB. It stores the schema classes for each customer, which are then applied at run time when accounts and reports are generated. The Loop application has been designed as a business framework that generates reports in real time, running on top of Node.js. MongoDB is helping us manage the entire application codebase in order to deliver the right schemas and application business modules to each user depending on their role and profile, i.e.: bookkeeper, CPA, sales executive. It is a very powerful feature enabled by the flexibility of the MongoDB document data model that we could not have implemented with the constraints imposed by a traditional relational data model.

Caching Layer: The user experience is critical, so we use MongoDB as a high-speed layer to manage user authentication and sessions.

Logging Layer: We also use MongoDB to store all the Loop application’s millions of clients requests each day. This enables us to build Tableau reports on top of the logs to troubleshoot production performance issues for each user session, and for each of the 220 regional KPMG sites spread across France. We are using the MongoDB Connector for BI to generate these reports in Tableau.

Why did you choose MongoDB?
When we started development back in 2012, we knew we needed schema flexibility to handle the massive variances in data structures the accounting suite would need to store and process. This requirement disqualified traditional relational databases from handling the caching, metadata management and KPIs benchmarking computation. As we explored different NoSQL options, we were concerned that we’d over-complicate our architecture by running separate caches and databases. However, in performance testing MongoDB offered the flexibility and scalability to serve both use cases. It outperformed the NoSQL databases and dedicated caches we tested, and so we took the decision to build our platform around MongoDB.

As our accounting suite is built on JavaScript, close integration between the JavaScript application and the database was also a significant advantage in helping us accelerate development cycles.

As we were developing our new financial benchmarking service last year, we evaluated Microsoft’s Azure Cosmos DB (note, at the time this was called DocumentDB), but MongoDB offered much richer query and indexing functionality. We also considered building the benchmarking analytics on Hadoop, but the architecture of MongoDB, coupled with the power of the aggregation pipeline gave us a much simpler solution, while delivering the data lake functionality we needed. Aggregation enhancements delivered in MongoDB 3.2, especially the introduction of the $lookup operator, were key to our technology decisions.

Can you describe what your MongoDB deployment looks like?
Both the caching layer and metadata management are run on dedicated three node replica sets. This gives the accounting suite fault resilience to ensure always-on availability. The metadata is largely read only, while the caching layer serves a mixed read / write workload.

The data lake is deployed as a sharded cluster handling both large batch loads of data from clients business systems while concurrently serving complex analytics queries and reporting to the CPAs.

We are running MongoDB on Windows instances in the Microsoft Azure cloud, after migrating from our own data center. We needed to ensure we could meet the scalability demands of the app, and the cloud is a better place to do that, rather than investing in our own infrastructure.

How do you support and manage your deployment?
We use MongoDB's fully managed database, MongoDB Atlas, and have access to 24x7 proactive support from MongoDB engineers. We have also recently used the Production Readiness package from MongoDB consulting services.

The combination of the cloud database service, professional services, and technical support are proving invaluable:

  • The MongoDB consultants reviewed our operational processes and Azure deployment plans, from which they able to provide guidance and best practices to execute the migration without interruption to the business. They also helped us create an operations playbook to institutionalize best practices going forward.
  • MongoDB Atlas automated the configuration and provisioning of MongoDB instances onto Azure, and we rely on it now to handle on-going upgrades and maintenance. A few simple clicks in the UI eliminates the need for us to develop our own configuration management scripts.
  • MongoDB Atlas also provides high-resolution telemetry on the health of our MongoDB databases, enabling us to proactively address any issues before they impact the CPAs.
  • Data integrity is obviously key to our business, and so Atlas is invaluable in providing continuous backups of our data lake. We evaluated managing backup ourselves, but ultimately it was much more cost effective for MongoDB to manage it for us as part of the fully managed backup service available through Atlas.

As part of your migration to Azure, you also migrated to the latest MongoDB 3.2 release. Can you share the results of that upgrade?
One word – scalability. With MongoDB 3.2 now using WiredTiger as its default storage engine, we can achieve much higher throughput and scalability on a lower hardware footprint.

The accounting suite supports almost 7,000 internal and external customers today, with half of them connecting for an average of 5 hours every working day. But we plan to roll it out to 20,000 customers over the next 18 months. We’ve been able to load test the suite against our development cluster, and MongoDB has scaled to 5x the current sessions, analytics and data volumes with no issues at all. WiredTiger’s document level concurrency control and storage compression are key to these results.

What future plans do you have for the Loop accounting suite?
We want to automate more of the benchmarking, and enable further data exploration to build predictive analytics models for our customers. This will enable us to provide benchmarks against both historic data, as well as evaluate future likely business outcomes. We plan on using the Azure Machine Learning framework against our MongoDB data lake.

How are you measuring the impact of MongoDB on your business?
We estimate that by selecting MongoDB for the accounting suite we achieved at least a 50% faster time to market than using any other non-relational database. The tight integration with JavaScript, flexible data model, out-of-box performance and sophisticated management platform have all been key to enabling developer productivity and reducing operational costs.

The accounting suite’s financial benchmarking service is a highly innovative application that provides KPMG France with significant competitive advantage. We have access to a lot of customer information which becomes actionable with our data lake built on MongoDB. It allows us to store that data cost effectively, while supporting rich analytics to give insights that other accounting practices just can’t match.

Christian, thanks for taking the time to share your story with the MongoDB community.

Thinking of implementing a data lake? Learn more from our guide:

Bringing Online Big Data to BI & Analytics with MongoDB


About the Author - Mat Keep

Mat is a director within the MongoDB product marketing team, responsible for building the vision, positioning and content for MongoDB’s products and services, including the analysis of market trends and customer requirements. Prior to MongoDB, Mat was director of product management at Oracle Corp. with responsibility for the MySQL database in web, telecoms, cloud and big data workloads. This followed a series of sales, business development and analyst / programmer positions with both technology vendors and end-user companies.

Crossing the Chasm: Looking Back on a Seminal Year of Cloud Technology

This post is part of our Road to re:Invent series. In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud.

![Road to AWS re:Invent](https://webassets.mongodb.com/_com_assets/cms/AWS_ReInvent-683wqzsi2z.jpg)

On the main stage of Amazon’s AWS re:Invent conference in Las Vegas last year, Capital One’s CIO, Rob Alexander made his way into headlines of tech publications when he explained that, under his leadership, the bank would be reducing the number of data centers from 8 in 2015 to just 3 in 2018. Capital One began using cloud-hosted infrastructure organically, with developers turning to the public cloud for a quick and easy way to provision development environments. The increase in productivity prompted IT leadership to adopt a cloud-first strategy not just for development and test environments, but for some of the bank’s most vital production workloads.

What generated headlines just a short year ago, Capital One’s story has now become just one of many examples of large enterprises shifting mission critical deployments to the cloud.

In a recent report released by McKinsey & Company, the authors declared “the cloud debate is over—businesses are now moving a material portion of IT workloads to cloud environments.” The report goes on to validate what many industry-watchers (including MongoDB, in our own Cloud Brief this May) have noted: cloud adoption in the enterprise is gaining momentum and is driven primarily by benefits in time to market.

According to McKinsey’s survey almost half (48 percent) of large enterprises have migrated an on-premises workload to the public cloud. Based on the conventional model of innovation adoption, this marks the divide between the “early majority” of cloud adopters and “late majority.” This not only means that the cloud computing “chasm” has been crossed, but that we have entered the period where the near term adoption of cloud-centric strategies will play a strong role in an organization’s ability to execute, and as a result, its longevity in the market.

![](https://webassets.mongodb.com/_com_assets/cms/AWS_ReInvent_Adoption_Lifecycle-awjdat7emu.png)
Image source: [Technology Adoption Lifecycle](https://upload.wikimedia.org/wikipedia/commons/d/d3/Technology-Adoption-Lifecycle.png)

An additional indication that the “chasm” has been bridged comes as more heavily-regulated industries put down oft-cited security concerns and pair public cloud usage with other broad-scale digitization initiatives. As Amazon, Google, and Microsoft (the three “hyperscale” public cloud vendors as McKinsey defines them) continue to invest significantly in securing their services, the most memorable soundbite from Alexander’s keynote continues to ring true: that Capital One can “operate more securely in the public cloud than we can in our own data centers."

As the concern over security in the public cloud continues to wane, other barriers to cloud adoption are becoming more apparent. Respondents to McKinsey’s survey and our own Cloud Adoption Survey earlier this year reported concerns of vendor lock-in and of limited access to talent with the skills needed for cloud deployment. With just 4 vendors holding over half of the public cloud market, CIOs are careful to select technologies that have cross-platform compatibility as Amazon, Microsoft, IBM, and Google continue to release application and data services exclusive to their own clouds.

This reluctance to outsource certain tasks to the hyperscale vendors is mitigated by a limited talent pool. Developers, DBAs, and architects with experience building and managing internationally-distributed, highly-available, cloud-based deployments are in high demand. In addition, it is becoming more complex for international business to comply with the changing landscape of local data protection laws as legislators try to keep pace with cloud technology. As a result, McKinsey predicts enterprises will increasingly turn to managed cloud offerings to offset these costs.

It is unclear whether the keynote at Amazon’s re:Invent conference next month will once again predicate the changing enterprise technology landscape for the coming year. However, we can be certain that the world’s leading companies will be well-represented as the public cloud continues to entrench itself even deeper into enterprise technology.


MongoDB Atlas, the cloud database service for MongoDB, is the easiest way to deploy and run MongoDB, allowing you to get started in minutes. Click here to learn more.

The MongoDB team will be at AWS re:Invent this November in Las Vegas and our CTO Eliot Horowitz will be speaking Thursday (12/1) afternoon. If you’re attending re:Invent, be sure to attend the session & visit us at booth #2620!

Learn more about AWS re:Invent

How Saavn Grew to India’s Largest Music Streaming Service with MongoDB

Building a push notification system on a sophisticated data analytics pipeline powered by Apache Kafka, Storm and MongoDB

2015 was an important year for the music industry. It was the first time digital became the primary revenue source for recorded music, overtaking sales of physical formats. Key to this milestone was the revenue generated by streaming services – growing over 45% in a single year.

As with many consumer services, the music streaming market is fragmented across the globe. In India – the 2nd most populous country on the planet and second largest smartphone market – Saavn has grown to become the sub-continent’s largest music service. It has 80m subscribers, experiencing a 9x increase in Daily Active Users (DAU) in just 24 months, with 90% of its streams served to mobile users. There are many factors that collectively have driven Saavn’s growth – but at the heart of it is data. And for this, they rely on MongoDB.

![](https://webassets.mongodb.com/_com_assets/cms/Saavn-Logo-Horizontal-White-500-eua0kyb1uk.png)

Saavn started out using MongoDB as a persistent cache, replacing an existing memcached layer. They soon realised the versatility and flexibility of the database to serve as the system of record for its data on subscribers, devices, and user activity. It was MongoDB’s flexibility and scalability that proved instrumental to maintain pace with Saavn’s breakneck growth.

Through its extensive collection of music, the company quickly attracted new users to its streaming service, but found engagement often dropped away. It identified that push notifications sent directly to client devices was key to reconnecting with users, and keeping them engaged by serving personalized playlists. At this year’s MongoDB World conference, CTO Sriranjan Manjunath, presented how Saavn has used MongoDB as part of a sophisticated analytics pipeline to drive a 3x increase in user engagement.

As Sriranjan and his team observed, it wasn’t enough to simply broadcast generic notifications to its users. Instead Saavn needed to craft notifications that provided playlists personalized to each user. Saavn built a sophisticated data processing pipeline that uses a scheduler to extract device, activity and user data stored in MongoDB. From there, it computes relevant playlists by analyzing a user’s listening preferences, activity, device, location and more. It then sends the computed recommendations to a dispatcher process that delivers the playlist to each user’s device and inbox. To refine personalizations, all user activity is ingested back into a Kafka queue where it is processed by Apache Storm and written back to MongoDB. Saavn is also expanding its use of artificial intelligence to better predict users interests, and is using MongoDB to store the resultant machine learning models and serve them in real time to the recommender application.

The system currently sends 30m notifications per day, but has been sized to support up to 1m per minute, providing plenty of headroom to support Saavn’s continued growth.

In his presentation, Sriranjan discussed how Saavn migrated from MongoDB 2.6 to MongoDB 3.0, taking advantage of the WiredTiger storage engine’s document level concurrency control to deliver improved performance. He talks about his key learnings in modifying schema design to reflect the differences in how updates are handled by the underlying storage engine, and usage of TTL indexes to automatically expire data from MongoDB . Sriranjan also discusses shard key selection to optimize uniform data distribution across the cluster, and the benefits of using MongoDB Cloud Manager for system monitoring and continuous backups, including integration with Slack for automated alerting to the ops team.

Click through to view Saavn’s presentation from MongoDB World

To learn more about managing real time streaming data, download:

The MongoDB and Kafka white paper


About the author - Mat Keep

Mat is a director within the MongoDB product marketing team, responsible for building the vision, positioning and content for MongoDB’s products and services, including the analysis of market trends and customer requirements. Prior to MongoDB, Mat was director of product management at Oracle Corp. with responsibility for the MySQL database in web, telecoms, cloud and big data workloads. This followed a series of sales, business development and analyst / programmer positions with both technology vendors and end-user companies.

Bringing MongoDB and Microservices to UPS i-parcel

How does the UPS i-parcel service give online shoppers a native checkout experience in more than 100 countries, and with 70 different currencies? In our webinar Yursil Kidwai, Vice President of Technology at UPS i-parcel, he explains how an infrastructure of MongoDB powered microservices helps answer that question.