MongoDB Updates

The newest releases and freshest updates

Atlas Charts Adds Support for Serverless and Online Archive Data Sources

We recently introduced streamlined data sources in Atlas Charts, which eliminates the manual steps involved with adding data sources into Charts. With MongoDB Atlas project data automatically available in Charts, your visualization workflow can become quicker and simpler than ever. With this feature, Atlas Charts users can now visualize two new sources of data: Serverless instances and Atlas cluster data that’s been archived using MongoDB Atlas Online Archive . For those unfamiliar with these data sources, here’s a quick summary: A serverless instance is an Atlas deployment model that lets you seamlessly scale usage based on workload demand and ensures you are only charged for resources you need. Online Archive enables automated data tiering of Atlas data, helping you scale your storage and optimize costs while keeping data accessible. Use cases These data sources serve two distinct use cases, based on your needs. So, whether you are trying to eliminate upfront resource provisioning using a serverless instance or creating archives of your high-volume workloads, such as time-series or log data to reduce costs with Online Archive, Charts makes these sources natively available for visualization with zero ETL, just as it always has with your other Atlas clusters. To learn how easy it is to visualize these new data sources, let’s create a serverless database called “ServerlessInstance0” and separately activate Online Archive on a database called “Cluster0” that will run daily in Atlas (Figure 1). Figure 1: Screenshot showing a serverless database deployed in MongDB Atlas. When setting up an Online Archive, Atlas creates two instances of your data (Figure 2). One instance includes only your archived data. The second instance contains your archive data and your live cluster data. This setup gives you additional flexibility to query data as your use case demands. Figure 2: Screenshot showing Online Archive instances in Atlas. Moving on to the Data Sources page in Charts (Figure 3), all of the data sources are shown, including serverless instances and Atlas cluster data archived in Online Archive, neatly categorized based on the instance type and ready for use in charts and dashboards. (Note that project owners maintain full control of these data sources.) For more details about connecting and disconnecting data sources, review our documentation . Figure 3: Screenshot showing Serverless and Online Archive data sources in Atlas Charts. With these additions, Charts now supports all the cluster configurations you can create in Atlas, and we are excited to see how you achieve your visualization goals using these new data sources. New to Atlas Charts? Get started today by logging into or signing up for MongoDB Atlas , deploying or selecting a cluster, and activating Charts for free.

October 27, 2022
Updates

Introducing Pay-As-You-Go MongoDB Atlas on Azure Marketplace

MongoDB was an official sponsor at the recent two-day, jam-packed 2022 Microsoft Ignite event. The centralized theme was “How to empower the customer to do more with less” in the Microsoft Cloud. The interactive conference created a meeting space for professionals to connect in-person with subject matter experts to discuss current and future points of digital transformation, attend workshops, learn key announcements, and discover innovative new offerings. Microsoft officially announced MongoDB to be part of a set of companies that make up the new Microsoft Intelligent Data Platform Partner Ecosystem and we are pleased to highlight our expanded alliance. Our partnership provides a frictionless process for developers to access MongoDB Atlas , the leading multi-cloud developer data platform available on the Microsoft Azure Marketplace . By procuring Atlas through the Azure Marketplace, customers can access a streamlined procurement and billing experience and use their Azure accounts to pay for their Atlas usage. MongoDB is also offering a free trial of the Atlas database through the Azure Marketplace. With the new Pay-As-You-Go Atlas listing on the Azure Marketplace, you only pay for the Atlas resources you use, with no upfront commitment required. You will receive just one monthly invoice on your Azure account that includes your Atlas usage, and you can apply existing Azure committed spend to it. Read the Azure Marketplace documentation to learn how to take advantage of the Microsoft Azure consumption commitment (MACC) and Azure commit to consume (CtC). You can even start free with an M0 Atlas cluster and scale up as needed. A free Atlas cluster comes with 512 MB of storage, out-of-the-box security features, and a basic support plan. If you’d like to upgrade your support plan, you can select one in Atlas and the additional cost will also be billed through Azure. MongoDB offers several support subscriptions with varying SLAs and levels of technical support. Whether you’re a new or existing Atlas customer, you can subscribe to Atlas directly from the Azure Marketplace. After you subscribe, you’ll be prompted to log in or create a new Atlas account. You can then deploy a new Atlas cluster or link your existing cluster(s) to your Azure account. Atlas customers can take advantage of best-in-class database features including: Production-grade security features, such as always-on authentication, network isolation, end-to-end encryption, and role-based access controls to keep your data protected. Global, high availability. Clusters are fault-tolerant and self-healing by default. Deploy across multiple regions for even better guarantees and low-latency local reads. Support for any class of workload. Build full-text search, run real-time analytics, share visualizations, and sync to the edge with fully integrated and native Atlas data services that require no manual data replication or additional infrastructure. New integrations that empower builders, developers, and digital natives to unlock the power of MongoDB Atlas when running on Azure—including PowerApps, PowerAutomate, PowerBI, Synapse, and Purview—to seamlessly add Atlas to existing architectures. With MongoDB Atlas on Microsoft Azure, developers receive access to the most comprehensive, secure, scalable, and cloud–based developer data platform in the market. Now, with the availability of Atlas on the Azure Marketplace, it’s never been easier for users to start building with Atlas while streamlining procurement and billing processes. Get started today through the Atlas on Azure Marketplace listing.

October 19, 2022
Updates

Introducing Snapshot Distribution in MongoDB Atlas

Data is at the heart of everything we do and in today’s digital economy has become an organization's most valuable asset. But sometimes the lengths that need to be taken to protect that data can present added challenges and result in manual processes that ultimately slow development, especially when it comes to maintaining a strict backup and recovery strategy. MongoDB Atlas aims to ease this burden by providing the features needed to help organizations not only retain and protect their data for recovery purposes, but to meet compliance regulations with ease. Today we’re excited to announce the release of a new backup feature, Snapshot Distribution. Snapshot Distribution allows you to easily distribute your backup snapshots across multiple geographic regions within your primary cloud provider with the click of a button. You can configure how snapshots are distributed directly within your backup policy and Atlas will automatically distribute them to other regions as selected—no manual process necessary. How to distribute your snapshots To enable Snapshot Distribution, navigate to the backup policy for your cluster and select the toggle to copy snapshots to other regions. From there, you can add any number of regions within your primary cloud provider—including regions you are not deployed in—to store snapshot copies. You can even customize your configuration to copy only specific types of snapshots to certain regions. Copy snapshots to other regions Restore your cluster faster with optimized, intelligent restores If you need to restore your cluster, Atlas will intelligently decide whether to use the original snapshot or a copied snapshot for optimal restore speeds. Copied snapshots may be utilized in cases where you are restoring to a cluster in the same region as a snapshot copy, including multi-region clusters if the snapshots are copied to every cluster region. Alternatively, if the original snapshot becomes unavailable due to a regional outage within your cloud provider, Atlas will utilize a copy in the nearest region to enable restores regardless of the cloud region outage. Perform point in time restore Get started with Snapshot Distribution Although storing additional snapshot copies in varying places may not always be required, this can be extremely useful in several situations, such as: For organizations who have a compliance requirement to store backups in different geographical locations from their primary place of operation For organizations operating multi-region clusters looking for faster direct-attach restores for the entire cluster If you fall into either of these categories, Snapshot Distribution may be a valuable feature addition to your current backup policy, allowing you to automate prior manual processes and free up development time to focus on innovation. Check out the documentation to learn more or navigate to your backup policy to enable this feature. Enable Snapshot Distribution

September 29, 2022
Updates

What’s New in Atlas Charts: Streamlined Data Sources

We’re excited to announce a major improvement to managing data sources in MongoDB Atlas Charts : Atlas data is now available for visualization automatically, with zero setup required. Every visualization relies on an underlying data source. In the past, Charts made adding Atlas data as a source fairly straightforward, but teams still needed to manually choose clusters and collections from which to power their dashboards. Streamlined data sources , however, eliminates the manual steps required to add data sources into Charts. This feature further optimizes your data visualization workflow by automatically making clusters, serverless instances, and federated database instances in your project available as data sources within Charts. For example, if you start up a new cluster or collection and want to create a visual quickly, you can simply go into one of your dashboards and start building a chart immediately. Check out streamlined data sources in action: See how the new data sources experience streamlines your data visualization workflow in Charts. Maintain full control of your data Although all project data will be available automatically to project members by default, we know how important it is to be able to control what data can be used by your team. For example, you may have sensitive customer data or company financials in a cluster. Project owners maintain full control over limiting access to data like this when needed. As shown in the following image, with a few clicks, you can select any cluster or collection, confirm whether or not any charts are using a data source, and disconnect when ready. If you have collections that you want some of your team to access but not others, this can be easily achieved under Data Access in collection settings as seen in the following image. With every release, our goal is to make visualizing Atlas data more frictionless and powerful. The Streamlined data sources feature helps us take a big step in this direction. Building data visualizations just got even easier with Atlas Charts. Give it a try today ! New to Atlas Charts? Get started today by logging into or signing up for MongoDB Atlas , deploying or selecting a cluster, and activating Charts for free.

September 21, 2022
Updates

MongoDB Connector for Apache Kafka 1.8 Available Now

MongoDB has released version 1.8 of the MongoDB Connector for Apache Kafka with new monitoring and debugging capabilities. In this article, we’ll highlight key features of this release. JMX monitoring The MongoDB Connector works with Apache Kafka Connect to provide a way for users to easily move data between MongoDB and the Apache Kafka. The MongoDB connector is written in Java and now implements Java Management Extensions (JMX) interfaces that allow you to access metrics reporting. These metrics will make troubleshooting and performance tuning easier. JMX technology, which is part of the Java platform, provides a simple, standard way for applications to provide metrics reporting with many third-party tools available to consume and present the data. For those who might not be familiar with JMX monitoring , let’s look at a few key concepts. An MBean is a managed Java object, which represents a particular component that is being measured or controlled. Each component can have one or more MBean attributes. The MongoDB Connector for Apache Kafka publishes MBeans under the “com.mongodb.kafka.connector” domain. Many open source tools are available to monitor JMX metrics, such as the console-based JmxTerm or the more feature-complete monitoring and alerting tools like Prometheus . JConsole is also available as part of the Java Development Kit (JDK). Note: Regardless of your client tool, MBeans for the connector are only available when there are active source or sink configurations defined on the connector. Visualizing metrics Figure 1: Source task JMX metrics from JConsole. Figure 1 shows some of the metrics exposed by the source connector using JConsole. In this example, a sink task was created and by default is called “sink-task-0”. The applicable metrics are shown in the JConsole MBeans panel. A complete list of both source and sink metrics will be available in the MongoDB Kafka Connector online documentation shortly after the release of 1.8. MongoDB Atlas is a great platform to store, analyze, and visualize monitoring metrics produced by JMX. If you’d like to try visualizing JMX metrics in MongoDB Atlas generated by the connector, check out jmx2mongo . This tool continuously writes JMX metrics to a MongoDB time series collection. Once the data is in MongoDB Atlas, you can easily create charts from the data like the following: Figure 2: MongoDB Atlas Chart showing successful batch writes vs writes greater than 100ms. Figure 2 shows the number of successful batch writes performed by a MongoDB sink task and the number of those batch writes that took longer than 100ms to execute. There are many other monitoring use cases available; check out the latest MongoDB Kafka Connector documentation for more information. Extended debugging Over the years, the connector team collected requests from users to enhance error messages or provide additional debug information for troubleshooting. In 1.8, you will notice additional log messages and more descriptive errors. For example, before 1.8, if you set the copy.existing parameter, you may get the log message: “Shutting down executors.” This message is not clear. To address this lack of clarity, the message now reads: “Finished copying existing data from the collection(s).” These debugging improvements in combination with the new JMX metrics will make it easier for you to gain insight into the connector and help troubleshoot issues you may encounter. If you have ideas for additional metrics or scenarios where additional debugging messages would be helpful, please let us know by filing a JIRA ticket . For more information on the latest release, check out the MongoDB Kafka Connector documentation . To download the connector, go to the MongoDB Connector repository in GitHub or download from the Confluent Hub .

September 19, 2022
Updates

Introducing the Ability to Independently Scale Analytics Node Tiers for MongoDB Atlas

We’re excited to announce analytics node tiers for MongoDB Atlas. Analytics node tiers provide greater control and flexibility by allowing you to customize the exact infrastructure you need for your analytics workloads. Analytics node tiers provide control and flexibility Until now, analytics nodes in MongoDB’s Atlas clusters have used the same cluster tier as all other nodes. However, operational and analytical workloads can vary greatly in terms of resource requirements. Analytics node tiers allow you to enhance the performance of your analytics workloads by choosing the best tier size for your needs. This means you can choose an analytics node tier larger or smaller than the operational nodes in your cluster. This added level of customization ensures you achieve the performance required for both transactional and analytical queries — without the need to over- or under-provision your entire cluster for the sake of the analytical workload. Analytics node tiers are available in both Atlas and Atlas for Government . A standard replica set contains a primary node for reads and writes and two secondary nodes that are read only. Analytics nodes provide an additional read-only node that is dedicated to analytical reads. Choose a higher or lower analytics node tier based on your analytics needs Teams with large user bases using their BI dashboards may want to increase their analytics node tiers above that of their operational nodes. Choosing a higher tier can be useful when you have many users or require more memory to serve analytics needs. Scaling up the entire cluster tier would be costly, but scaling up just your analytics node tiers helps optimize the cost. Teams with inconsistent needs may want to decrease their analytics node tier below that of their operational nodes. The ability to set a lower tier gives you flexibility and cost savings when you have fewer users or analytics are not your top priority. With analytics node tiers, you get more discretion and control over how you manage your analytics workloads by choosing the appropriately sized tier for your analytics needs. Get started today by setting up a new cluster or adding an analytics node tier to any existing cluster. Check out our documentation to learn more.

August 3, 2022
Updates

7 Big Reasons to Upgrade to MongoDB 6.0

First announced at MongoDB World 2022, MongoDB 6.0 is now generally available and ready for download now. MongoDB 6.0 includes the capabilities introduced with the previous 5.1–5.3 Rapid Releases and debuts new abilities to help you address more use cases, improve operational resilience at scale, and secure and protect your data. The common theme in MongoDB 6.0 is simplification: Rather than forcing you to turn to external software or third-party tools, these new MongoDB capabilities allow you to develop, iterate, test, and release applications more rapidly. The latest release helps developers avoid data silos, confusing architectures, wasted time on integrating external tech, missed SLAs and other opportunities, and the need for custom work (such as pipelines for exporting data). Here’s what to expect in MongoDB 6.0. 1. Even more support for working with time series data Used in everything from financial services to e-commerce, time series data is critical for modern applications. Properly collected, processed, and analyzed, time series data provide a gold mine of insights — from user growth to promising areas of revenue — helping you grow your business and improve your application. First introduced in MongoDB 5.0, time series collections provide a way to handle these workloads without resorting to adding a niche technology and the resulting complexity. In addition, it was critical to overcome obstacles unique to time series data, such as high volume, storage and cost considerations, and gaps in data continuity (caused by sensor outages). Since its introduction, time series collections have been continuously updated and improved with a string of rapid releases . We began by introducing sharding for time series collections (5.1) to better distribute data, before rolling out columnar compression (5.2) to improve storage footprints, and finally moving on to densification and gap-filling (5.3) for allowing teams to run time series analytics — even when there are missing data points. As of 6.0, time series collections now include secondary and compound indexes on measurements, improving read performance and opening up new use cases like geo-indexing. By attaching geographic information to time series data, developers can enrich and broaden analysis to include scenarios involving distance and location. This could take the form of tracking temperature fluctuations in refrigerated delivery vehicles during a hot summer day or monitoring the fuel consumption of cargo vessels on specific routes. We’ve also improved query performance and sort operations. For example, MongoDB can now easily return the last data point in a series — rather than scanning the whole collection — for faster reads. You can also use clustered and secondary indexes to efficiently perform sort operations on time and metadata fields. 2. A better way to build event-driven architectures With the advent of applications like Seamless or Uber, users have come to expect real-time, event-driven experiences, such as activity feeds, notifications, or recommendation engines. But moving at the speed of the real world is not easy, as your application must quickly identify and act on changes in your data. Introduced in MongoDB 3.6, change streams provide an API to stream any changes to a MongoDB database, cluster, or collection, without the high overhead that comes from having to poll your entire system. This way, your application can automatically react, generating an in-app message notifying you that your delivery has left the warehouse or creating a pipeline to index new logs as they are generated. The MongoDB 6.0 release enriches change streams, adding abilities that take change streams to the next level. Now, you can get the before and after state of a document that’s changed, enabling you to send updated versions of entire documents downstream, reference deleted documents, and more. Further, change streams now support data definition language (DDL) operations, such as creating or dropping collections and indexes. To learn more, check out our blog post on change streams updates . 3. Deeper insights from enriched queries MongoDB’s aggregation capabilities allow users to process multiple documents and return computed results. By combining individual operators into aggregation pipelines, you can build complex data processing pipelines to extract the insights you need. MongoDB 6.0 adds additional capabilities to two key operators, $lookup and $graphlookup , improving JOINS and graph traversals, respectively. Both $lookup and $graphlookup now provide full support for sharded deployments. The performance of $lookup has also been upgraded. For instance, if there is an index on the foreign key and a small number of documents have been matched, $lookup can get results between 5 and 10 times faster than before. If a larger number of documents are matched, $lookup will be twice as fast as previous iterations. If there are no indexes available (and the join is for exploratory or ad hoc queries), then $lookup will yield a hundredfold performance improvement. The introduction of read concern snapshot and the optional atClusterTime parameter enables your applications to execute complex analytical queries against a globally and transactionally consistent snapshot of your live, operational data. Even as data changes beneath you, MongoDB will preserve point-in-time consistency of the query results returned to your users. These point-in-time analytical queries can span multiple shards with large distributed datasets. By routing these queries to secondaries, you can isolate analytical workloads from transactional queries with both served by the same cluster, avoiding slow, brittle, and expensive ETL to data warehouses. To learn more, visit our documentation . 4. More operators, less work Boost your productivity with a slate of new operators, which will enable you to push more work to the database — while spending less time writing code or manipulating data manually. These new MongoDB operators will automate key commands and long sequences of code, freeing up more developer time to focus on other tasks. For instance, you can easily discover important values in your data set with operators like $maxN , $minN , or $lastN . Additionally, you can use an operator like $sortArray to sort elements in an array directly in your aggregation pipelines. 5. More resilient operations From the beginning, MongoDB’s replica set design allows users to withstand and overcome outages. Initial sync is how a replica set member in MongoDB loads a full copy of data from an existing member — critical for catching up nodes that have fallen behind, or when adding new nodes to improve resilience, read scalability, or query latency. MongoDB 6.0 introduces initial sync via file copy, which is up to four times faster than existing, current methods. This feature is available with MongoDB Enterprise Server. In addition to the work on initial sync, MongoDB 6.0 introduces major improvements to sharding, the mechanism that enables horizontal scalability. The default chunk size for sharded collections is now 128 MB, meaning fewer chunk migrations and higher efficiency from both a networking perspective and in internal overhead at the query routing layer. A new configureCollectionBalancing command also allows the defragmentation of a collection in order to reduce the impact of the sharding balancer. 6. Additional data security and operational efficiency MongoDB 6.0 includes new features that eliminate the need to choose between secure data or efficient operations. Since its GA in 2019, client-side field-level encryption (CSFLE) has helped many organizations manage sensitive information with confidence, especially as they migrate more of their application estate into the public cloud. With MongoDB 6.0, CSFLE will include support for any KMIP-compliant key management provider. As a leading industry standard, KMIP streamlines storage, manipulation, and handling for cryptographic objects like encryption keys, certificates, and more. MongoDB’s support for auditing allows administrators to track system activity for deployments with multiple users, ensuring accountability for actions taken across the database. While it is important that auditors can inspect audit logs to assess activities, the content of an audit log has to be protected from unauthorized parties as it may contain sensitive information. MongoDB 6.0 allows administrators to compress and encrypt audit events before they are written to disk, leveraging their own KMIP-compliant key management system. Encryption of the logs will protect the events' confidentiality and integrity. If the logs propagate through any central log management systems or SIEM, they stay encrypted. Additionally, Queryable Encryption is now available in preview. Announced at MongoDB World 2022, this pioneering technology enables you to run expressive queries against encrypted data — only decoding the data when it is made available to the user. This ensures that data remains encrypted throughout its lifecycle, and that rich queries can be run efficiently without having to decrypt the data first. For a deep dive into the inner workings of Queryable Encryption, check out this feature story in Wired . 7. A smoother search experience and seamless data sync Alongside the 6.0 Major Release, MongoDB will also make ancillary features generally available and available in preview. The first is Atlas Search facets , which enable fast filtering and counting of results, so that users can easily narrow their searches and navigate to the data they need. Released in preview at MongoDB World 2022 , facets will now include support for sharded collections. Another important new addition is Cluster-to-Cluster Sync , which enables you to effortlessly migrate data to the cloud, spin up dev, test, or analytics environments, and support compliance requirements and audits. Cluster-to-Cluster Sync provides continuous, unidirectional data synchronization of two MongoDB clusters across any environment, be it hybrid, Atlas, on-premises, or edge. You’ll also be able to control and monitor the synchronization process in real time, starting, stopping, resuming, or even reversing the synchronization as needed. Ultimately, MongoDB 6.0’s new abilities are intended to facilitate development and operations, remove data silos, and eliminate the complexity that accompanies the unnecessary use of separate niche technologies. That means less custom work, troubleshooting, and confusing architectures — and more time brainstorming and building. MongoDB 6.0 is not an automatic upgrade unless you are using Atlas serverless instances. If you are not an Atlas user, download MongoDB 6.0 directly from the download center . If you are already an Atlas user with a dedicated cluster, take advantage of the latest, most advanced version of MongoDB. Here’s how to upgrade your clusters to MongoDB 6.0 .

July 19, 2022
Updates

Change Streams in MongoDB 6.0 Support Pre- and Post-Image Retrieval, DDL operations, and more

Introduced with MongoDB 3.6, a MongoDB change stream is an API on top of the operations log (oplog) that allows users to subscribe their applications to data changes in a collection, database, or entire deployment. It makes it easy for teams to build event-driven applications or systems on MongoDB that capture and react to data changes in near real time — no middleware or database polling scripts required. For MongoDB 6.0, we have enhanced change streams with new functionality that addresses a wider range of use cases while improving performance. Change streams now allow users to easily retrieve the before and after state of an entire document — sometimes referred to as pre- and post-images, respectively — when a document is either updated or deleted. Let’s suppose that you are storing user sessions in a collection and using a time-to-live (TTL) index to delete sessions as they expire. You can now reference data in the deleted documents to provide more information to the end user about their session after the fact. Or maybe you need to send an updated version of the entire document to a downstream system each time there is a data change. Added support for retrieving the before and after states of a document greatly expands the use cases change streams can address. Prior to MongoDB 6.0, change streams only supported data manipulation language (DML) operations. Change streams in MongoDB 6.0 will now support data definition language (DDL) operations such as creating and dropping indexes and collections so you can react to database events in addition to data changes. Change streams are built on MongoDB’s aggregation framework, which gives teams the capability to not only capture and react to data changes, but also to filter and transform the associated notifications as needed. With MongoDB 6.0, change streams that leverage filtering will have those stages automatically pushed to the optimal position within a change stream pipeline, dramatically improving performance. We’re excited to announce these enhancements to change streams with MongoDB 6.0 and look forward to seeing and hearing about all the applications and systems you’ll build with this expanded feature set. To learn more, visit our docs .

July 19, 2022
Updates

Announcing Atlas Data Federation and Atlas Data Lake

Two years ago, we released the first iteration of Atlas Data Lake . Since then, we’ve helped customers combine data from various storage layers to feed downstream systems. But after years spent studying our customers’ experiences, we realized we hadn’t gone far enough. To truly unleash the genius in all our developers, we needed to add an economical cloud object storage solution with a rich MQL query experience to the world of Atlas. Today, we’re thrilled to announce that our new Atlas Data Federation and Atlas Data Lake offerings do just that. We now offer two complementary services, Atlas Data Federation (our existing query service formerly known as Atlas Data Lake) and our new and improved Atlas Data Lake (a fully managed analytic-oriented storage service). Together, these services (both in preview) provide flexible and versatile options for querying and transforming data across storage services, as well as a MongoDB-native analytic storage solution. With these tools, you can query across multiple clusters, move data into self managed cloud object storage for consumption by downstream services, query a workload-isolated inexpensive copy of cluster data, compare your cluster data across different points in time, and much, much more. In hearing from our customers about their experiences with Atlas Data Lake, we learned where they have struggled, as well as the features they’ve been looking for us to provide. With this in mind, we decided to shift the name of our current query federation service to Atlas Data Federation to better align with how customers see the service and are getting value. We’ve seen many customers benefit from the flexibility of a federated query engine service, including querying data across multiple clusters, databases, and collections, as well as exporting data to third-party systems. We also saw where our customers were struggling with data lakes. We heard them ask for a fully managed storage solution so they could achieve all of their analytic goals within Atlas. Specifically, customers wanted scalable storage that would provide high query performance at a low cost. Our new Data Lake provides a high-performance analytic object storage solution, allowing customers to query historical data with no additional formatting or maintenance work needed on their end. How it works Atlas Data Federation encompasses our existing Data Lake functionality with several new affordances. It continues to deliver the same power that it always has, with increased performance and efficiency. The new Atlas Data Lake will now allow you to create Data Lake pipelines (based on your Atlas Cluster backup schedules) and fields on which you can optimize queries. The service takes the following steps: On the selected schedule, a copy of your collection will be extracted from your Atlas backup with no impact to your cluster. During extraction, we build partition indexes based on the contents of your documents and the fields you’ve selected for optimization. These indexes allow your queries to be optimized by capturing the minimums and maximums (and other stats) of the records in each partition, letting you quickly find the relevant data for your queries. Finally, the underlying data lands in an analytic-oriented format inside of cloud object storage. This minimizes data scanned when you execute a query. Once a pipeline has run and a Data Lake dataset has been created, you can select it as a data source in our new Data Federation query experience. You can either set it as the source for a specific virtual collection in a Federated Database Instance or you can have your Federated Database Instance generate a collection name for each dataset that your pipeline has created. Amazingly, no part of this process will consume compute resources from your cluster — neither the export nor the querying of datasets. These datasets provide workload isolation and consistency for long-running analytic queries, a target for ETL jobs using the powerful $out to S3. This makes it easy to compare the state of your data over time. Advanced though this is, it’s only the beginning of the story. We’re committing to evolving the service, improving performance, adding more sources of data, and building new features. All of this will be based on the feedback you, the user, gives us. We can’t wait to see how you’ll use this powerful new tool and can’t wait to hear what you’d like to see next. Try Atlas Data Lake Today

June 7, 2022
Updates

Keeping Data in Sync Anywhere with Cluster-to-Cluster Sync

For over a decade, MongoDB users have been deploying clusters for some of their most important workloads. We work with customers running MongoDB in a variety of environments, but there are three main environments that we see customers using: Globally distributed cloud clusters (Atlas and self-managed): Enterprises have been successfully running cloud-based applications — in multiple zones and regions — for 10-plus years. More recently, the deployment of globally distributed multi-cloud data clusters has provided tremendous value and flexibility for modern applications. The last two years of the pandemic resulted in an accelerated proliferation of cloud data clusters to support new application services and workloads. On-premises clusters: Many leading companies and government institutions remain reliant on their on-premises systems for various reasons, including regulatory compliance, data governance, existing line-of-business application integrations, or legacy investments. Edge clusters: Organizations also distribute workloads to edge systems to bring enterprise applications closer to data sources, such as local edge servers ingesting sensor data from IoT devices. This proximity to data at its source can deliver substantial business benefits, including improved response times and faster insights. Keeping hybrid data clusters in sync is challenging Due to the diverse data origins and evolution of apps, maintaining data stores in hybrid environments — i.e., distributing data between different environments or distributing data between multiple clusters in a single environment — can be challenging. As application owners innovate and expand to new data environments, a big part of their success will depend on effective data synchronization between their clusters. Cluster data synchronization requires: Support for globally distributed hybrid data clusters . All cluster data must be synchronized between different types of clusters. Continuous synchronization . Support for a constant, nonstop stream of data that seamlessly flows across cluster deployments and is accessible by apps connecting to those different deployments. Resumability . The ability to pause and resume data synchronization from where you left off. The need for a hybrid, inter-cluster data sync By default, a MongoDB cluster allows you to natively distribute and synchronize data globally within a single cluster. We automate this intra-cluster movement of data using replica sets and sharded clusters . These two configurations let you replicate data across multiple zones, geographical regions, and even multi-cloud configurations. But there are occasions when users want to go beyond a single MongoDB cluster and synchronize data to a separate cluster (inter-cluster) configuration for use cases such as: Migrating to MongoDB Atlas Creating separate development and production environments Supporting DevOps strategies (e.g., blue-green deployments) Deploying dedicated analytics environments Meeting locality requirements for auditing and compliance Maintaining preparedness for a stressed exit (e.g., reverse cloud migration) Moving data to the edge Introducing Cluster-to-Cluster Sync We designed Cluster-to-Cluster Synchronization to solve the challenges of inter-cluster data synchronization. It provides you with continuous unidirectional data synchronization of two MongoDB clusters (source to destination) in the same or hybrid environments. With Cluster-to-Cluster Sync, you have full control of your synchronization process by deciding when to start, stop, pause, resume, or reverse the direction of synchronization. You can also monitor the progress of the synchronization in real time. Availability Cluster-to-Cluster Sync is now Generally Available as part of MongoDB 6.0. Currently, Cluster-to-Cluster Sync is compatible only with source and destination clusters that are running on MongoDB 6+. What's next? To get started with Cluster-to-Cluster Sync, you need mongosync , a downloadable and self-hosted tool that enables data movement between two MongoDB clusters. Get started today: Download Cluster-to-Cluster Sync Read the Cluster-to-Cluster Sync docs Learn more about Cluster-to-Cluster Sync

June 7, 2022
Updates

Ready to get Started with MongoDB Atlas?

Start Free