We are very excited to announce that Cloud Manager Backup now supports the following command-line options:
smallfiles(this is an option for MMAPv1 only)
Cloud Manager Backup will now take into account the command line options of the primary during an initial sync. If your primary uses one of these options, and you want your backup to as well, just resync your backups. If you’ve been holding off using Cloud Manager Backup because of our lack of support of these options, you need wait no more.
Please note that existing snapshots will not be converted, only snapshots created for jobs that were resynced after noon, EDT on May 16 will have these options enabled.
Unlocking Operational Intelligence from the Data Lake: Part 2 - Operationalizing the Data Lake
As we discussed in part 1 , Hadoop-based data lakes excel at generating new forms of insight from diverse data sets, but are not designed to provide real-time access to operational applications. Users need to make analytic outputs from Hadoop available to their online, operational apps. These applications have specific access demands that cannot be met by HDFS, including: Millisecond latency query responsiveness. Random access to indexed subsets of data. Supporting expressive ad-hoc queries and aggregations against the data, making online applications smarter and contextual. Updating fast-changing data in real time as users interact with online applications, without having to rewrite the entire data set. Bringing together operational and analytical processing across high volumes of variably structured data in a single database requires capabilities unique to MongoDB: Workload isolation. MongoDB replica sets can be provisioned with dedicated analytic nodes. This allows users to simultaneously run real-time analytics and reporting queries against live data, without impacting nodes servicing the operational application, and avoiding lengthy ETL cycles. Dynamic schema, coupled with data governance. MongoDB's document data model makes it easy for users to store and combine data of any structure, without giving up sophisticated validation rules, data access and rich indexing functionality. If new attributes need to be added – for example enriching user profiles with geo-location data – the schema can be modified without application downtime, and without having to update all existing records. Expressive queries. The MongoDB query language enables developers to build applications that can query and analyze the data in multiple ways – by single keys, ranges, text search, and geospatial queries through to complex aggregations and MapReduce jobs, returning responses in milliseconds. Complex queries are executed natively in the database without having to use additional analytics frameworks or tools, and avoiding the latency that comes from moving data between operational and analytical engines. Rich secondary indexes. Providing fast filtering and access to data by any attribute, MongoDB supports compound, unique, array, partial, TTL, geospatial, sparse, and text indexes to optimize for multiple query patterns, data types and application requirements. Indexes are essential when operating across slices of the data, for example updating the churn analysis of a subset of high net worth customers, without having to scan all customer data. BI & analytics integration. The MongoDB Connector for BI enables industry leading analytical and visualization tools such as Tableau to efficiently access data stored in MongoDB using standard SQL. Robust security controls. Extensive access controls, auditing for forensic analysis and encryption of data both in-flight and at-rest enables MongoDB to protect valuable information and meet the demands of big data workloads in regulated industries. Scale-out on commodity hardware. MongoDB can be scaled within and across geographically distributed data centers, providing extreme levels of availability and scalability. As your data lake grows, MongoDB scales easily with no downtime and no application changes. Advanced management and cloud platform. To reduce data lake TCO and risk of application downtime, MongoDB Ops Manager provides powerful tooling to automate database deployment, scaling, monitoring and alerting, and disaster recovery. Further simplifying operations, MongoDB Atlas delivers MongoDB as a service, providing all of the features of the database, without the operational heavy lifting required for any application. MongoDB Atlas is a great choice if you want the database run for you, or if your data lake and apps are also running on a public cloud platform. MongoDB Atlas is available on-demand through a pay-as-you-go model and billed on an hourly basis. High skills availability. With availability of Hadoop skills cited by Gartner analysts as a top challenge, it is essential you choose an operational database with a large available talent pool. This enables you to find staff who can rapidly build differentiated big data applications. Across multiple measures, including DB Engines Rankings , The 451 Group NoSQL Skills Index and the Gartner Magic Quadrant for Operational Databases , MongoDB is the leading non-relational database. In addition, the ability to apply the same distributed processing frameworks such as Apache Spark, MapReduce and Hive to data stored in both HDFS and MongoDB allows developers to converge analytics of both real time, rapidly changing data sets with the models created by batch Hadoop jobs. Through sophisticated connectors, Spark and Hadoop can pass queries as filters and take advantage of MongoDB’s rich secondary indexes to extract and process only the range of data it needs – for example, retrieving all customers located in a specific geography. This is very different from less featured datastores that do not support a rich query language or secondary indexes. In these cases, Spark and Hadoop jobs are limited to extracting all data based on a simple primary key, even if only a subset of that data is required for the query. This means more data movement between the data lake and the database, more processing overhead, more hardware, and longer time-to-insight for the user. Table 1: How MongoDB stacks up for operational intelligence As demonstrated in Table 1, operational intelligence requires a fully-featured database serving as a System of Record for online applications. These requirements exceed the capabilities of simple key-value or column-oriented datastores that are typically used for short lived, transient data, or legacy relational databases structured around rigid row and column table formats and scale-up architectures. Figure 1: Design pattern for operationalizing the data lake Figure 1 presents a design pattern for integrating MongoDB with a data lake: Data streams are ingested to a pub/sub message queue, which routes all raw data into HDFS. Processed events that drive real-time actions, such as personalizing an offer to a user browsing a product page, or alarms for vehicle telemetry, are routed to MongoDB for immediate consumption by operational applications. Distributed processing frameworks such as Spark or MapReduce jobs materialize batch views from the raw data stored in the Hadoop data lake. MongoDB exposes these models to the operational processes, serving queries and updates against them with real-time responsiveness. The distributed processing frameworks can re-compute analytics models, against data stored in either HDFS or MongoDB, continuously flowing updates from the operational database to analytics views. In part 3, we’ll demonstrate how leading companies are using the design pattern discussed above to operationalize their data lakes. Learn more by reading the Operational Data Lake white paper. Unlocking Operational Intelligence from the Data Lake About the Author - Mat Keep Mat is director of product and market analysis at MongoDB. He is responsible for building the vision, positioning and content for MongoDB’s products and services, including the analysis of market trends and customer requirements. Prior to MongoDB, Mat was director of product management at Oracle Corp. with responsibility for the MySQL database in web, telecoms, cloud and big data workloads. This followed a series of sales, business development and analyst / programmer positions with both technology vendors and end-user companies.
4 Ways MongoDB Enhances Your Google BigQuery Experience
MongoDB and Google Cloud continue to build on their partnership, with MongoDB enhancing Google Cloud with pay-as-you-go abilities, unified billing, and integrations with multiple different GC features, including BigQuery . And, when it comes to data architecture, BigQuery and MongoDB are two products that are better together. Google BigQuery and MongoDB are better together Google’s serverless data warehouse, BigQuery, was launched in 2011 with an aim to enhance business agility as their cloud-native data warehouse. BigQuery allows for fast queries that can uncover insights using familiar SQL. When MongoDB is added to the database technology stack as a complementary technology, it enhances the breadth of capabilities for the developer across a variety of use cases, including the following four examples. Combined impact of the Enterprise Data Warehouse and the Operational Data Store BigQuery is best suited as an Enterprise Data Warehouse (EDW), meaning it is designed to optimize long-running analytics. MongoDB Atlas , on the other hand, is best suited as an Operational Data Store (ODS), designed to optimally support high throughput and highly concurrent real-time operational applications that demand random access to an entity’s data in native JSON. This combination means that BigQuery and MongoDB are complementary technologies that can jointly deliver more value — each delivering on their strongest qualities. BigQuery excels at long-running queries, while Atlas handles the real-time operational application needs with thousands of concurrent sessions and millisecond response times. Enriched end-customer experiences BigQuery enables data scientists and analysts with machine learning (ML) models and BI tools for structured and semi-structured data at scale. For roles that need results with a turnaround time of a day or more, BigQuery is a strong tool for big data queries. With MongoDB Atlas, engineers and development teams can build applications faster and handle highly diverse schema, query, and update patterns, adapting to demanding user needs and competition. Atlas can also deliver the real-time or less than 24-hour queries that are necessary to keep your business operational. Additionally, data can easily move back and forth between the two platforms, creating a prime combination for running analytics on operational data. Being able to unlock the full potential of your data across your organization means that everyone has the insight into the business metrics they need, when they need it. This allows quicker decision making, as well as stronger and more accurate reporting. Extensibility to MongoDB Atlas features On top of the value and synergy that can be realized by a BigQuery+Atlas combination, other Atlas features can help enhance the usefulness and sophistication of a data architecture, such as: Atlas Charts can be leveraged to create rich visualizations of any data stored within Atlas. Atlas Triggers and Alerts can apply database logic in response to events or on a predefined schedule. Atlas Search brings full-text search at scale to all data across MongoDB and BigQuery alike. Atlas Data Federation enables aggregating data across multiple data sources, such as Atlas clusters and HTTPS endpoints, and transforming it into analytical formats (e.g., Parquet). This means you can not only access data in real-time, but you can also analyze it in a visual, user-friendly way. This functionality makes your data more actionable, allowing you not only to answer questions about your business data but also make better predictions and future adjustments based on it. Furthermore, being alerted to certain data-based events and triggering new actions based on that information means you can have your data working more efficiently for you, freeing up time to innovate and focus on core business competencies. Lastly, this approach simplifies your data lifecycle, so JSON data from various applications and endpoints can easily be transformed and consumed for rich analytics. Deeper understanding of your customer Businesses can use fully managed MongoDB Atlas to store customer 360 profiles. A 360-degree view of a customer allows businesses to track an individual customer’s journey across multiple channels, devices, purchases, and interactions, and improves customer satisfaction. With the combination of Atlas and BigQuery, businesses can also use compiled data — such as, transactional data, behavioral data, user profile and segmentations, and business analytics — to match user profiles with products and services using Artificial Intelligence (AI). Vertex AI , a managed machine learning platform, provides all the Google cloud services in one place to deploy and maintain AI models. Being able to easily access a 360 view for each customer and have automation around their customer journey helps with customer engagement and loyalty by improving customer satisfaction and retention through personalization and targeted marketing communications. It also enables retailers to aggregate customer interactions across all channels and identify valuable new customers. Google BigQuery and MongoDB Atlas in the real world Current , a leading U.S. challenger bank, uses innovative approaches, services, and technologies to serve people overlooked by traditional banks, regardless of age or income level, to help improve their financial outcomes. To help create customer experiences that cannot exist in traditional systems, Current chose to leverage Google Cloud, including BigQuery, with MongoDB layering the platform to achieve their goals. Read Full Current Story Are you a Google BiqQuery customer that is curious about how MongoDB Atlas can amplify your existing data warehouse or data lake architecture? Try MongoDB Atlas for free today and spin up your first workload in minutes. Try pay-as-you-go Atlas on GC Marketplace