We are very excited to announce that Cloud Manager Backup now supports the following command-line options:
smallfiles(this is an option for MMAPv1 only)
Cloud Manager Backup will now take into account the command line options of the primary during an initial sync. If your primary uses one of these options, and you want your backup to as well, just resync your backups. If you’ve been holding off using Cloud Manager Backup because of our lack of support of these options, you need wait no more.
Please note that existing snapshots will not be converted, only snapshots created for jobs that were resynced after noon, EDT on May 16 will have these options enabled.
Unlocking Operational Intelligence from the Data Lake: Part 2 - Operationalizing the Data Lake
As we discussed in part 1 , Hadoop-based data lakes excel at generating new forms of insight from diverse data sets, but are not designed to provide real-time access to operational applications. Users need to make analytic outputs from Hadoop available to their online, operational apps. These applications have specific access demands that cannot be met by HDFS, including: Millisecond latency query responsiveness. Random access to indexed subsets of data. Supporting expressive ad-hoc queries and aggregations against the data, making online applications smarter and contextual. Updating fast-changing data in real time as users interact with online applications, without having to rewrite the entire data set. Bringing together operational and analytical processing across high volumes of variably structured data in a single database requires capabilities unique to MongoDB: Workload isolation. MongoDB replica sets can be provisioned with dedicated analytic nodes. This allows users to simultaneously run real-time analytics and reporting queries against live data, without impacting nodes servicing the operational application, and avoiding lengthy ETL cycles. Dynamic schema, coupled with data governance. MongoDB's document data model makes it easy for users to store and combine data of any structure, without giving up sophisticated validation rules, data access and rich indexing functionality. If new attributes need to be added – for example enriching user profiles with geo-location data – the schema can be modified without application downtime, and without having to update all existing records. Expressive queries. The MongoDB query language enables developers to build applications that can query and analyze the data in multiple ways – by single keys, ranges, text search, and geospatial queries through to complex aggregations and MapReduce jobs, returning responses in milliseconds. Complex queries are executed natively in the database without having to use additional analytics frameworks or tools, and avoiding the latency that comes from moving data between operational and analytical engines. Rich secondary indexes. Providing fast filtering and access to data by any attribute, MongoDB supports compound, unique, array, partial, TTL, geospatial, sparse, and text indexes to optimize for multiple query patterns, data types and application requirements. Indexes are essential when operating across slices of the data, for example updating the churn analysis of a subset of high net worth customers, without having to scan all customer data. BI & analytics integration. The MongoDB Connector for BI enables industry leading analytical and visualization tools such as Tableau to efficiently access data stored in MongoDB using standard SQL. Robust security controls. Extensive access controls, auditing for forensic analysis and encryption of data both in-flight and at-rest enables MongoDB to protect valuable information and meet the demands of big data workloads in regulated industries. Scale-out on commodity hardware. MongoDB can be scaled within and across geographically distributed data centers, providing extreme levels of availability and scalability. As your data lake grows, MongoDB scales easily with no downtime and no application changes. Advanced management and cloud platform. To reduce data lake TCO and risk of application downtime, MongoDB Ops Manager provides powerful tooling to automate database deployment, scaling, monitoring and alerting, and disaster recovery. Further simplifying operations, MongoDB Atlas delivers MongoDB as a service, providing all of the features of the database, without the operational heavy lifting required for any application. MongoDB Atlas is a great choice if you want the database run for you, or if your data lake and apps are also running on a public cloud platform. MongoDB Atlas is available on-demand through a pay-as-you-go model and billed on an hourly basis. High skills availability. With availability of Hadoop skills cited by Gartner analysts as a top challenge, it is essential you choose an operational database with a large available talent pool. This enables you to find staff who can rapidly build differentiated big data applications. Across multiple measures, including DB Engines Rankings , The 451 Group NoSQL Skills Index and the Gartner Magic Quadrant for Operational Databases , MongoDB is the leading non-relational database. In addition, the ability to apply the same distributed processing frameworks such as Apache Spark, MapReduce and Hive to data stored in both HDFS and MongoDB allows developers to converge analytics of both real time, rapidly changing data sets with the models created by batch Hadoop jobs. Through sophisticated connectors, Spark and Hadoop can pass queries as filters and take advantage of MongoDB’s rich secondary indexes to extract and process only the range of data it needs – for example, retrieving all customers located in a specific geography. This is very different from less featured datastores that do not support a rich query language or secondary indexes. In these cases, Spark and Hadoop jobs are limited to extracting all data based on a simple primary key, even if only a subset of that data is required for the query. This means more data movement between the data lake and the database, more processing overhead, more hardware, and longer time-to-insight for the user. Table 1: How MongoDB stacks up for operational intelligence As demonstrated in Table 1, operational intelligence requires a fully-featured database serving as a System of Record for online applications. These requirements exceed the capabilities of simple key-value or column-oriented datastores that are typically used for short lived, transient data, or legacy relational databases structured around rigid row and column table formats and scale-up architectures. Figure 1: Design pattern for operationalizing the data lake Figure 1 presents a design pattern for integrating MongoDB with a data lake: Data streams are ingested to a pub/sub message queue, which routes all raw data into HDFS. Processed events that drive real-time actions, such as personalizing an offer to a user browsing a product page, or alarms for vehicle telemetry, are routed to MongoDB for immediate consumption by operational applications. Distributed processing frameworks such as Spark or MapReduce jobs materialize batch views from the raw data stored in the Hadoop data lake. MongoDB exposes these models to the operational processes, serving queries and updates against them with real-time responsiveness. The distributed processing frameworks can re-compute analytics models, against data stored in either HDFS or MongoDB, continuously flowing updates from the operational database to analytics views. In part 3, we’ll demonstrate how leading companies are using the design pattern discussed above to operationalize their data lakes. Learn more by reading the Operational Data Lake white paper. Unlocking Operational Intelligence from the Data Lake About the Author - Mat Keep Mat is director of product and market analysis at MongoDB. He is responsible for building the vision, positioning and content for MongoDB’s products and services, including the analysis of market trends and customer requirements. Prior to MongoDB, Mat was director of product management at Oracle Corp. with responsibility for the MySQL database in web, telecoms, cloud and big data workloads. This followed a series of sales, business development and analyst / programmer positions with both technology vendors and end-user companies.
Revolutionizing Data Storage and Analytics with MongoDB Atlas on Google Cloud and HCL
Every organization requires data they can trust—and access—regardless of its format, size, or location. The rapid pace of change in technology and the shift towards cloud computing is revolutionizing how companies handle, govern and manage their data by freeing them from the heavy operational burden of on-premise deployments. Enterprises are looking for a centralized, cost-effective solution that allows them to scale their storage and analytics so they can ingest data and perform artificial intelligence (AI) and machine learning (ML) operations, ultimately expanding their marketing horizon. This blog post explores why companies should partner with MongoDB Atlas on Google Cloud to begin their data revolution journey, and how HCL Technologies can support customers looking to migrate. MongoDB Atlas as the distributed data platform MongoDB Atlas is the leading database-as-a-service on the market for three main reasons: Unparalleled developer experience - allows organizations to bring new features to market at a high velocity Horizontal scalability - supports hundreds of terabytes of data with sub-second queries Flexibility - stores data to meet various regulatory, operational, and high availability requirements. The versatility offered by MongoDB’s document model makes it ideal for modern data-driven use cases that require support for structured, semi-structured, and unstructured content all within a single platform. Its flexible schema allows changes to support new application features without costly schema migrations typically required with relational databases. MongoDB Atlas extends the core database by offering services like Atlas Search and MongoDB Realm that are a necessity for modern applications. Atlas Search provides a powerful Apache Lucene-based full text search engine that automatically indexes data in your MongoDB database without the need for a separate dedicated search engine or error-prone replication processes. Realm provides edge-to-cloud sync and backend services to accelerate and simplify mobile and web development. Atlas’ distributed architecture supports horizontal scaling for data volume, query latency, and query throughput which offers the scalability benefits of distributed data storage alongside the rich functionality of a fully-featured general purpose database. MongoDB Atlas is unique in its ability to provide the most wanted database as a managed service and is relied on by the world’s largest companies for their mission-critical production applications. Innovation powered by collaboration with HCL Technologies MongoDB’s versatility as a general-purpose database, in addition to its massive scalability, makes it a perfect foundation for analytics, visualization, and AI/ML applications on Google Cloud. As an MSP partner for Google Cloud, HCL Technologies helps enterprises accelerate and risk-mitigate their digital agenda, powered by Google Cloud. We’ve successfully implemented applications leveraging MongoDB Atlas on Google Cloud, building upon MongoDB’s flexible JSON-like data model, rich querying and indexing, and elastic scalability in conjunction with Google Cloud’s class-leading cloud infrastructure, data analytics, and machine learning capabilities. HCL is working with some of the world’s largest enterprises in building secure, performant, and cost-effective solutions with MongoDB and Google. Possessing technical expertise in Google Cloud, MongoDB, machine learning, and data science, our dedicated team developed a reference architecture that ensures high performance and scalability. This is simplified by MongoDB Atlas’ support for Google Cloud services which allows it to essentially operate as a cloud-native solution. Highlighted features include: Integration with Google Cloud Key Management Service Use of Google Cloud’s native storage snapshot for fast backup and restore Ability to create read-only MongoDB nodes in Google Cloud to reduce latency with Google Cloud-native services regardless of where the primary node is located (even other public cloud providers!) Integrated billing with Google Cloud Ability to span a single MongoDB cluster across Google Cloud regions worldwide, and more As represented in Figure 1 below, MongoDB Atlas on Google Cloud can be used as a single database solution for transactional, operational, and analytical workloads across a variety of use cases. Figure 1: MongoDB's core characteristics and features The following architecture in Figure 2 demonstrates the ease of reading and writing data to MongoDB from Google Cloud services. Dataflow, Cloud Data Fusion, and Dataproc can be leveraged to build data pipelines to migrate data from heterogeneous databases to MongoDB and to feed data to create interactive dashboards using Looker. These data pipelines support both batch and real-time ingestion workloads and can be automated and orchestrated using Google Cloud - native services.. Figure 2: MongoDB Atlas' integration with core Google Cloud services A data platform built using MongoDB Atlas and Google Cloud offers an integrated suite of services for storage, analysis, and visualization. Address your business challenges with HCL: Industry use cases Data-driven solutions built with MongoDB Atlas on Google Cloud have multiple applications across industries such as financial services, media and entertainment, healthcare, oil and gas, energy, manufacturing, retail, and the public sector. Every industry can benefit from this highly integrated storage and analytical solution. Use Cases and Benefits Data lake modernization with low cost and high availability for media and entertainment customers: Maintaining high availability and a low-cost data lake is an obstacle for any online entertainment platform that builds mobile or web ticketing applications. However, building on Google App Engine with MongoDB Atlas Clusters in the backend allows for a high-availability, low-cost data platform that seamlessly feeds data to downstream analytics platforms in real time. Unified data platform for retail customers: The retail business frequently requests an agile environment in order to encourage innovation among its engineers. With its agility in scaling and resource management, seamless multi-region clusters, and premium monitoring, running MongoDB Atlas on Google Cloud is a fantastic choice for building a single data platform. This simplifies the management of different data platforms and allows developers to focus on new ideas. High-speed real-time data platform of supply chain system for manufacturing units: By having real-time visibility and distributed data services, supply chain data can become a competitive advantage. MongoDB Atlas on Google Cloud provides a solid foundation for creating distributed data services with a unified, easy-to-maintain architecture. The unrivaled speed of MongoDB Atlas simplifies supply chain operations with real-time data analytics. The way forward Even in just the past decade, organizations have been forced to adapt to the extremely fast pace of innovation in the data analytics landscape: moving from batch to real-time, on-premise to cloud, gigabytes to petabytes, and the increased accessibility of advanced AI/ML models thanks to providers like Google Cloud. With our track record of success in this domain, HCL Technologies is uniquely positioned to help organizations realize the joint benefits of building data analytics applications with best-of-breed solutions from Google Cloud and MongoDB. Visit us to learn more about the HCL Google Ecosystem Business Unit and how we can help you harness the power of MongoDB Atlas and Google Cloud Platform to change the way you store and analyze your data through these solutions.