We are very excited to announce that Cloud Manager Backup now supports the following command-line options:
smallfiles(this is an option for MMAPv1 only)
Cloud Manager Backup will now take into account the command line options of the primary during an initial sync. If your primary uses one of these options, and you want your backup to as well, just resync your backups. If you’ve been holding off using Cloud Manager Backup because of our lack of support of these options, you need wait no more.
Please note that existing snapshots will not be converted, only snapshots created for jobs that were resynced after noon, EDT on May 16 will have these options enabled.
Unlocking Operational Intelligence from the Data Lake: Part 2 - Operationalizing the Data Lake
As we discussed in part 1 , Hadoop-based data lakes excel at generating new forms of insight from diverse data sets, but are not designed to provide real-time access to operational applications. Users need to make analytic outputs from Hadoop available to their online, operational apps. These applications have specific access demands that cannot be met by HDFS, including: Millisecond latency query responsiveness. Random access to indexed subsets of data. Supporting expressive ad-hoc queries and aggregations against the data, making online applications smarter and contextual. Updating fast-changing data in real time as users interact with online applications, without having to rewrite the entire data set. Bringing together operational and analytical processing across high volumes of variably structured data in a single database requires capabilities unique to MongoDB: Workload isolation. MongoDB replica sets can be provisioned with dedicated analytic nodes. This allows users to simultaneously run real-time analytics and reporting queries against live data, without impacting nodes servicing the operational application, and avoiding lengthy ETL cycles. Dynamic schema, coupled with data governance. MongoDB's document data model makes it easy for users to store and combine data of any structure, without giving up sophisticated validation rules, data access and rich indexing functionality. If new attributes need to be added – for example enriching user profiles with geo-location data – the schema can be modified without application downtime, and without having to update all existing records. Expressive queries. The MongoDB query language enables developers to build applications that can query and analyze the data in multiple ways – by single keys, ranges, text search, and geospatial queries through to complex aggregations and MapReduce jobs, returning responses in milliseconds. Complex queries are executed natively in the database without having to use additional analytics frameworks or tools, and avoiding the latency that comes from moving data between operational and analytical engines. Rich secondary indexes. Providing fast filtering and access to data by any attribute, MongoDB supports compound, unique, array, partial, TTL, geospatial, sparse, and text indexes to optimize for multiple query patterns, data types and application requirements. Indexes are essential when operating across slices of the data, for example updating the churn analysis of a subset of high net worth customers, without having to scan all customer data. BI & analytics integration. The MongoDB Connector for BI enables industry leading analytical and visualization tools such as Tableau to efficiently access data stored in MongoDB using standard SQL. Robust security controls. Extensive access controls, auditing for forensic analysis and encryption of data both in-flight and at-rest enables MongoDB to protect valuable information and meet the demands of big data workloads in regulated industries. Scale-out on commodity hardware. MongoDB can be scaled within and across geographically distributed data centers, providing extreme levels of availability and scalability. As your data lake grows, MongoDB scales easily with no downtime and no application changes. Advanced management and cloud platform. To reduce data lake TCO and risk of application downtime, MongoDB Ops Manager provides powerful tooling to automate database deployment, scaling, monitoring and alerting, and disaster recovery. Further simplifying operations, MongoDB Atlas delivers MongoDB as a service, providing all of the features of the database, without the operational heavy lifting required for any application. MongoDB Atlas is a great choice if you want the database run for you, or if your data lake and apps are also running on a public cloud platform. MongoDB Atlas is available on-demand through a pay-as-you-go model and billed on an hourly basis. High skills availability. With availability of Hadoop skills cited by Gartner analysts as a top challenge, it is essential you choose an operational database with a large available talent pool. This enables you to find staff who can rapidly build differentiated big data applications. Across multiple measures, including DB Engines Rankings , The 451 Group NoSQL Skills Index and the Gartner Magic Quadrant for Operational Databases , MongoDB is the leading non-relational database. In addition, the ability to apply the same distributed processing frameworks such as Apache Spark, MapReduce and Hive to data stored in both HDFS and MongoDB allows developers to converge analytics of both real time, rapidly changing data sets with the models created by batch Hadoop jobs. Through sophisticated connectors, Spark and Hadoop can pass queries as filters and take advantage of MongoDB’s rich secondary indexes to extract and process only the range of data it needs – for example, retrieving all customers located in a specific geography. This is very different from less featured datastores that do not support a rich query language or secondary indexes. In these cases, Spark and Hadoop jobs are limited to extracting all data based on a simple primary key, even if only a subset of that data is required for the query. This means more data movement between the data lake and the database, more processing overhead, more hardware, and longer time-to-insight for the user. Table 1: How MongoDB stacks up for operational intelligence As demonstrated in Table 1, operational intelligence requires a fully-featured database serving as a System of Record for online applications. These requirements exceed the capabilities of simple key-value or column-oriented datastores that are typically used for short lived, transient data, or legacy relational databases structured around rigid row and column table formats and scale-up architectures. Figure 1: Design pattern for operationalizing the data lake Figure 1 presents a design pattern for integrating MongoDB with a data lake: Data streams are ingested to a pub/sub message queue, which routes all raw data into HDFS. Processed events that drive real-time actions, such as personalizing an offer to a user browsing a product page, or alarms for vehicle telemetry, are routed to MongoDB for immediate consumption by operational applications. Distributed processing frameworks such as Spark or MapReduce jobs materialize batch views from the raw data stored in the Hadoop data lake. MongoDB exposes these models to the operational processes, serving queries and updates against them with real-time responsiveness. The distributed processing frameworks can re-compute analytics models, against data stored in either HDFS or MongoDB, continuously flowing updates from the operational database to analytics views. In part 3, we’ll demonstrate how leading companies are using the design pattern discussed above to operationalize their data lakes. Learn more by reading the Operational Data Lake white paper. Unlocking Operational Intelligence from the Data Lake About the Author - Mat Keep Mat is director of product and market analysis at MongoDB. He is responsible for building the vision, positioning and content for MongoDB’s products and services, including the analysis of market trends and customer requirements. Prior to MongoDB, Mat was director of product management at Oracle Corp. with responsibility for the MySQL database in web, telecoms, cloud and big data workloads. This followed a series of sales, business development and analyst / programmer positions with both technology vendors and end-user companies.
Data Resilience with MongoDB Atlas
Data is the central currency in today's digital economy. Studies have shown that 43% of companies that experience major data loss incidents are unable to resume business operations. A range of scenarios can lead to data loss, yet within the realm of database technology, they typically fall under three main categories: catastrophic technical malfunctions, human error, and cyber attacks. A data loss event due to a catastrophic breakdown, human error, or cyber attack is not a matter of if, but a matter of when it will occur. Hence, businesses need to focus on how to avoid and minimize the effects as much as possible. Failure to effectively address these risks can lead to extended periods of downtime of a few hours or even a few weeks following an incident. The average cost of cyberattacks is a surprising $4.45 million, with some attacks costing in the hundreds of millions. Reputational harm is harder to quantify but no doubt real and substantial. The specific industry you're in might be subject to regulatory frameworks designed to counter cyber attacks. Businesses that are subject to regulatory regimes must maintain compliance with these requirements. This can determine the configuration of your disaster recovery approach. In this blog post, we'll explain the key disaster recovery (DR) capabilities available with MongoDB Atlas . We'll also cover the core responsibilities and strategies for data resilience including remediation, and recovery objectives (RTO/RPO). Planning for data resilience in Atlas Data resilience is not a one-size-fits-all proposition, which is why we offer a range of choices in Atlas for a comprehensive strategy. Our sensible defaults ensure you're automatically safeguarded, while also offering a variety of choices to precisely align with the needs of each individual application. When formulating a disaster recovery plan, organizations commonly begin by assessing their recovery point objective (RPO) and recovery time objective (RTO). The RPO specifies the amount of data the business can tolerate losing during an incident, while the RTO indicates the speed of recovery. Since not all data carries the same urgency, analyzing the RPO and RTO on a per-application basis is important. For instance, critical customer data might have specific demands compared to clickstream analytics. The criteria for RTO, RPO, and the length of time you need to retain backups will influence the financial and performance implications of maintaining backups. With MongoDB Atlas, we provide standard protective measures by default, with customizable options for tailoring protection to the service level agreements specified by the RPO and RTO in your DR plan. These are enhanced by additional features that can be leveraged to achieve greater levels of availability and durability for your most vital tasks. These features can be grouped into two main categories: prevention and recovery. Backup, granular recovery, and resilience There are many built-in features that are designed to prevent disasters from ever happening in the first place. Some key features and capabilities that enable a comprehensive prevention strategy include multi-region and multi-cloud clusters , encryption at rest , Queryable Encryption , cluster termination safeguards , backup compliance protocols , and the capability to test resilience . (We will discuss the features in-depth in part two of this series.) While prevention might satisfy the resilience needs of certain applications, different applications may demand greater resilience against failures based on the business requirements of data protection and disaster recovery. MongoDB provides comprehensive management of data backups, including the geographic distribution of backups across multiple regions, and the ability to prevent backups from being deleted, all through an automated retention schedule. Recovery capabilities are aimed at supporting RTO and minimizing data loss and include continuous cloud backups with point-in-time recovery. Atlas cloud backups utilize the native snapshot feature of your cluster's cloud service provider, ensuring backup storage is kept separate from your MongoDB Atlas instances. Backups are essentially snapshots that capture the condition of your database cluster at a specific moment. They serve as a safeguard in case data is lost or becomes corrupted. For M10+ clusters, you have the option of utilizing Atlas Cloud Backups, which leverage the cluster's cloud service provider for storing backups in a localized manner. Atlas comes with strong default backup retention of 12 months out of the box. You also have the option to customize snapshot and retention schedules, including the time of day for snapshots, the frequency at which snapshots are taken over time, and retention duration. Another important feature is continuous cloud backup with point-in-time recovery, which enables you to restore data to the moment just before any incident or disruption, such as a cyber attack. To ensure your backups are regionally redundant and you can still restore even if the primary region that your backups are in is down, MongoDB Atlas offers the ability to copy these critical backups, with the point-in-time data, to any secondary region available from your cloud provider in Atlas. For the most stringent regulations, or for businesses that want to ensure backups are available even after a bad actor or cyber attack, MongoDB Atlas can ensure that no user, regardless of role, can ever delete a backup before a predefined protected retention period with the Backup Compliance Policy. Whatever your regulatory obligations or business needs are, MongoDB Atlas provides the flexibility to tailor your backup settings for requirements. Crucially, this ensures you can recover quickly, minimizing data loss and meeting your RPO in the event of a disaster recovery scenario. When properly configured, testing has shown that Atlas can quickly recover to the exact timestamp before a disaster or failure event, giving you a zero-minute RPO and RTO of less than 15 minutes when utilizing optimized restores. Recovery times can vary due to cloud provider disk warming and which point in time you are restoring to. So, it is important to also test this regularly. This means that regardless of your regulatory or business requirements, MongoDB Atlas allows you to configure your backups to ensure that you can meet your recovery requirements and, most importantly, recover with precision and speed to ensure that your data loss is minimal and your recovery point objectives are met should you experience a recovery event. Conclusion As regulations and business needs continue to evolve, and cyber-attacks become more sophisticated and varied, creating and implementing a data resilience strategy can be simple and manageable. MongoDB Atlas comes equipped with built-in measures that deliver robust data resilience at the database layer, ensuring your ability to both avoid incidents and promptly restore operations with minimal data loss if an incident does occur. Furthermore, setting up and overseeing additional advanced data resilience features is straightforward, with automation driven by a pre-configured policy that operates seamlessly at any scale. This streamlined approach supports compliance without the need for manual interventions, all within the MongoDB Atlas platform. For more information on the data resilience and disaster recovery features in MongoDB Atlas, download the Data Resilience Strategy with MongoDB Atlas whitepaper. To get started on Atlas today, we invite you to launch a free tier today .