MongoDB Updates

The newest releases and freshest updates

Atlas Charts Adds a Dedicated Hub for Managing Embedded Charts and Dashboards

Since the release of the Charts Embedding SDK in May of 2020, developers have been exploring powerful new ways to visualize and share data from their MongoDB Atlas clusters. Embedding charts and dashboards is a valuable use case for Charts users and the new Embedding Page streamlines the embedding experience for first time users and veterans alike. Everything you need on one screen Don’t worry if the concept of embedding within the MongoDB Charts platform is new to you. The Getting Started tab provides configuration guidance, and links to video references, code snippets, live sandboxes, and other resources to help you get started. But just as your applications may evolve according to your needs, your embedding requirements may also change over time. Once you have set up an embedded dashboard or chart, the Items tab acts as the landing page. Think of this as a live snapshot of your current embedding environment. You’ll see a list of all of your charts grouped by their dashboards, be able to search based on title or description, and filter the list to show only dashboards. Within each row, you can view a chart or dashboard’s embedded status, see which type of embedding is enabled, view and copy the embedding ID, and access the full suite of embedding settings available for each item. This means that you can add filters or change your embedding method without having to know exactly where every chart or related setting lives. This approach also lets you operate with confidence on one single page. How cool is that? Authentication settings The Charts SDK allows you to configure unauthenticated embedding for dashboards or charts, making for a painless way to share these items in a safe and controlled environment. Depending on your use case, this setup may be a little more flexible than you’d like. The Authentication Settings tab contains authentication provider settings, giving project owners a single source of truth for adding and maintaining providers. Our focus for this feature is on simplicity and consolidation. We believe wholeheartedly that if we can enable you to spend less time hunting down where to configure settings or find resources, you can focus more on what really matters and build great software. For more information on authentication options, read our documentation . New to MongoDB Atlas Charts? Get started today by logging in to or signing up for MongoDB Atlas , deploying or selecting a cluster, and activating Charts for free.

May 18, 2022
Updates

Ruby Added to MongoDB Export to Language for Compass and VS Code

Thousands of developers rely on Compass as a GUI or VS Code as an integrated development environment to query their data in MongoDB. With both Compass and the official MongoDB Extension for VS Code, you can build a query or aggregation using the MongoDB Query API and export it to your chosen programming language. The only limitation? Until now, only four languages have been supported for this feature in both tools: Java, Node.js, C#, and Python. Those four languages cover a significant percentage of the MongoDB developer community, but we knew we wanted to expand to help even more developers export queries/aggregations to their programming language of choice. To this end, we’re pleased to announce that the Export to Language features in both Compass and the VS Code Extension for MongoDB now support exporting to Ruby. To build a query/aggregation in VS Code and export to Ruby, connect to your cluster in VS Code, create a Playground with code that draws on the Query API, and highlight your Query API syntax. From there, you will see a lightbulb icon that gives you the option to export to Ruby, among other languages. You also have the option to export a sample query/aggregation including details on the driver usage so you start with a fully functional code snippet. To build a query/aggregation in Compass and export to Ruby, simply connect to your cluster from Compass, navigate to the “Aggregations” tab, build your query/aggregation, and then click the button with an export icon immediately to the right of the “Save” button. Once you’ve followed the above steps in VS Code or Compass, you’re ready to use the exported code in your Ruby app! For a peek under the hood at how MongoDB’s engineers added Ruby to Compass, check out this great article on Dev.to by Rachelle Palmer , Product Lead for Developer Experience at MongoDB. We hope all the hardcore Rubyists out there find this new feature useful and that it makes it even easier to build Ruby apps with MongoDB. As you continue to use these tools within your application development cycle, don’t hesitate to reach out and give us feedback .

April 1, 2022
Updates

Improved Experience for Saved Aggregations and Queries in MongoDB Compass

Tens of thousands of MongoDB users take advantage of MongoDB Compass to query their data and build sophisticated aggregation pipelines. As an easy-to-use GUI, Compass lets you seamlessly connect to and interact with your data, including using our powerful Query API. You just connect to your cluster, navigate to your chosen database and collection, and start building your queries. Many Compass users want to come back again and again to their best queries — or make a query repeatedly available to all database users — but the experience of working with saved queries and aggregations has created some challenges for users in the past. Previously, saved aggregations and queries were bound to a specific database and collection, making it harder to integrate those saved queries and aggregations into the standard software-development lifecycle. If, for example, you built an aggregation pipeline against a staging database and saved it, you’d still have to build that same pipeline again if you wanted to use it for your production database. Users have also reported difficulty finding their favorites after saving them. That’s why we’ve released a new-and-improved experience for saved aggregations and queries in MongoDB Compass. It includes a new “My Queries” screen you can navigate to from the left sidebar or from a tab at the top, next to the “Database” and “Performance” tabs. Once on the “My Queries” screen, you can search across all your saved queries/aggregations and sort or filter by database or collection. And you can apply your saved queries/aggregations across namespaces. To learn more about working with queries and aggregations in Compass, visit our documentation on the Aggregation Pipeline Builder or queries . We’re confident this new experience will make it easier than ever to build, save, and reuse your favorite aggregations and queries, and ultimately remove friction with integrating them into the application development process. Head over to your Compass instance and check it out. (If you’re not yet a Compass user, you can download it for free .) Happy querying!

April 1, 2022
Updates

Introducing the Newest Version of the MongoDB Spark Connector

MongoDB has just released an all-new version of our Spark Connector. This article discusses the background behind the MongoDB Spark Connector and some of the key features of the new release. Why a new version? The current version of the MongoDB Spark Connector was written in 2016 and is based on Version 1 of the Spark Data Source API. This API is still supported, but Databricks has released an updated version of the API, making it easier for data sources like MongoDB to work within the Spark ecosystem. By using Version 2 of the MongoDB Spark Connector, you’ll immediately benefit from capabilities such as tighter integration with Spark Structured Streaming. MongoDB will continue to support Version 1 until Databricks deprecates its Datasource API, but no new features will be implemented, and upgrades to the Connector will include only bug fixes and support for the current version. Which version should I use? The new Spark Connector (Version 10.0) is not intended to be a direct replacement for applications that use the current MongoDB Spark Connector. Note that the new connector uses a different namespace, “com.mongodb.spark.sql.connector.MongoTableProvider”, versus the original Spark Connector, which uses “com.mongodb.spark.DefaultSource”. Having a different namespace makes it possible to use both versions of the Connector within the same Spark application. This is helpful in unit testing your application with the new Connector and making the transition on your timeline. Also note a change with versioning of the MongoDB Spark Connector. The current version of the existing MongoDB Spark Connector is 3.0. Up until now, as MongoDB released versions of the connector, the number was aligned with the version of Spark that was supported—i.e., Version 2.4 of the MongoDB Spark Connector works with Spark 2.4. Going forward, this will not be the case. MongoDB's documentation will make clear which versions of Spark the Connector supports and provide the appropriate information. Structured Streaming to MongoDB Apache Spark comes with a stream processing engine called Structured Streaming, which is based on Spark's SQL engine and DataFrame APIs. Spark Structured Streaming treats each incoming stream of data as a microbatch, continually appending each microbatch to the target dataset. This makes it easy to convert existing Spark batch jobs into streaming jobs. Structured Streaming provides maximum throughput via the same distributed capabilities that have made Spark such a popular platform. In the following example, we’ll show you how to stream data to MongoDB using Structured Stream. Consider a CSV file that contains natural gas prices. The following PySpark code will read the CSV file into a stream, compute a moving average, and stream the results into MongoDB. from pyspark.sql.types import StructType, DateType, StringType, TimestampType, DoubleType from pyspark.sql import functions as F from pyspark.sql.window import Window from pyspark.sql.functions import lit, count sc.setLogLevel('DEBUG') readSchema = ( StructType() .add('Type', StringType()) .add('Date', TimestampType()) .add('Price', DoubleType()) ) ds = (spark .readStream.format("csv") .option("header", "true") .schema(readSchema) .load("daily*.csv")) slidingWindows = (ds .withWatermark("Date", "1 minute") .groupBy(ds.Type, F.window(ds.Date, "7 day")) .avg() .orderBy(ds.Type,'window')) dsw = ( slidingWindows .writeStream .format("mongodb") .queryName("7DaySlidingWindow") .option("checkpointLocation", "/tmp/pyspark/") .option("forceDeleteTempCheckpointLocation", "true") .option('spark.mongodb.connection.uri', 'MONGODB CONNECTION HERE') .option('spark.mongodb.database', 'Pricing') .option('spark.mongodb.collection', 'NaturalGas') .outputMode("complete")) query = dsw.start() query.processAllAvailable() query.stop() For more information and examples on the new MongoDB Spark Connector V10.0, check out our documentation . Ask questions and give feedback on the MongoDB Community Forum. The Connector is open sourced; feel free to contribute at GitHub .

March 31, 2022
Updates

Dashboard Embedding Comes to Charts

Atlas Charts is the native data visualization that lets you create, share, and embed charts from data in MongoDB Atlas. Today, we are happy to announce the release of Dashboard Embedding . What is an embedded dashboard? Embedding provides the ability to bring charts and dashboards into any webpage or application where your users spend time. It’s a great way to share dynamic, live visualizations with a wide audience. We see customers using embedded charts to share real-time information internally within their companies, as well as externally with their customers. Until now, embedding multiple charts where you need them has been time consuming and somewhat complex. With dashboard embedding , rather than individually embedding one chart at a time, you can embed a full dashboard in one go! Embedding charts and dashboards is easy to use and provides the flexibility you need when it comes to visual customizations and data security. This ensures you are able to freely share information and have it look the way you want, while always keeping data secure. How does Dashboard Embedding work in Atlas Charts? Enabling Dashboard Embedding is as simple as picking a dashboard, going into the Embed menu, and enabling either authenticated or unauthenticated embedding as shown below. The Dashboard Embedding Menu Similar to embedding an individual chart, you can publicly embed a URL in an iFrame of your application or website. Unauthenticated embedding allows you to embed your dashboards, and have visualizations immediately ready for public consumption. Alternatively, you can use our embedding SDK to implement authenticated embedding. Authenticated embedding provides more control and security, requiring users to have appropriate permission to access your dashboard. We support Google, Realm and Custom JSON Web Token authentication options . Charts offers a number of configuration options important to ensuring your embedded dashboards look and feel how you want. These options include support for dark mode, background color selection, chart sizing, and refresh rate cadence. Dashboards can even be set to resize responsively based on your screen size, ensuring a consistent user experience. Play around with dashboard embedding and some of these features in our code sandbox . An embedded dashboard in dark mode Get started today! Dashboard Embedding was one of our most frequently requested features in Atlas Charts and we are excited to see how you take advantage of it in your applications! Head over to any of your dashboards in Charts to enable dashboard embedding or take a look at Github to get started with the embedding SDK. New to Atlas Charts? Get started today by logging into or signing up for MongoDB Atlas , deploying or selecting a cluster, and activating Charts for free.

March 24, 2022
Updates

Introducing MongoDB’s Prometheus Monitoring Integration

Wouldn’t it be great if you could connect your data stored in the world’s leading document database to the leading open source monitoring solution? Absolutely! And now you can. Prometheus has been a longstanding developer favored solution by providing monitoring and alerting functionality for cloud-native environments. It has key features like a multi-dimensional data model with time series support, a flexible query language to leverage their dimensionality called PromQL, and no reliance on distributed storage. MongoDB meets monitoring like never before Our integration allows you to view MongoDB hardware and monitoring metrics all within Prometheus. If you were a user of MongoDB and Prometheus before, this means you no longer have to worry about jumping back and forth between applications to view your data. Our official Prometheus integration provides complete feature parity with Atlas metrics in a secure and supported environment. With a few clicks in the UI, you can configure the integration and set up custom scraping intervals for your Atlas Admin API endpoints to ensure your view in Prometheus is consistently updated based on your preference. Best of all, this integration is free and available for use with MongoDB Atlas (clusters M10 and higher) and Cloud Manager. We truly believe in the freedom to run anywhere, and that includes viewing your data in your preferred monitoring solutions. How the Prometheus Integration works with MongoDB The MongoDB Prometheus integration converts the results of a series of MongoDB commands into Prometheus protocol, allowing Prometheus to scrape the metrics you can view through your MongoDB monitoring charts and more. Once Prometheus successfully collects your metrics, you can parse your metrics in the Prometheus UI or create custom dashboards in Grafana. Get started with the Prometheus Integration If you already have an Atlas account, get started by following the instructions below: Log into your Atlas account. Click the vertical three dot menu next to the project dropdown in the upper lefthand corner of the screen. Select “Integrations.” The Prometheus Monitoring Integration is listed here. Select “Configure” on the Prometheus tile, and follow the guided setup flow. If you don’t have an Atlas account, create an m10 or higher Atlas cluster and follow the instructions above. Note: If you were one of the customers who requested this integration, we thank you! We appreciate your feedback and suggestions, and look forward to implementing more in the future. Input is always welcome at feedback.mongodb.com .

March 16, 2022
Updates

MongoDB and AWS Expand Global Collaboration

MongoDB launched as a developer-friendly, open source database in 2009, but it wasn't until 2016, when we released MongoDB Atlas , our fully managed database service, that the full vision for MongoDB truly emerged. Realizing that vision, however, has never been a solo effort. From the earliest days, MongoDB has partnered with a range of companies, but none more closely than with Amazon Web Services (AWS) as we've joined forces to make the developer experience as seamless as possible. Now we're kicking that partnership into overdrive. As announced today , MongoDB is expanding our global partnership with AWS. Though details of the agreement are confidential, the results will not be: Customers stand to benefit from deeper, broader technical integrations, improvements in migrating workloads from legacy data infrastructure to modern MongoDB Atlas, and more. For those of us who have worked to grow this partnership, it's exciting (and rewarding!) to see the scope of the work envisioned by MongoDB and AWS, together. On that note, it's worth revisiting how we got here. Building together From the earliest days , we've positioned MongoDB as the best way to manage a wide variety of data types and sources, in real time, at significant scale. Back then we called it "Big Data," but now we recognize it for what it is: what all modern data looks like. Then and now, MongoDB came with an open license that encouraged developers to easily access and tune the database to their needs. And so they did, with many developers opting to run their instances of MongoDB on AWS, removing the need to buy and provision servers. In fact, almost from the start of the company, we have worked closely with AWS to ensure that MongoDB users and customers would have an excellent experience running MongoDB on AWS. It was a great start, but it wasn't enough. Developers, after all, still had to fiddle with the dials and knobs of managing the database. This began to change in 2011, when the company released the MongoDB Monitoring Service (MMS). MMS made it much easier to monitor MongoDB clusters of any size. By 2013, we rolled MMS, Backup, and other MongoDB services into the MongoDB Management Service, and continued to work closely with AWS to optimize these services for MongoDB customers. Then in 2016, again with extensive AWS assistance, we launched MongoDB Atlas, a fully managed, integrated suite of cloud database and data services to accelerate and simplify how developers build with data. Making life easier for developers was the vision that co-founders Dwight Merriman and Eliot Horowitz had when they started MongoDB (then 10gen) in 2007. That vision has always depended on a strong partnership with AWS. This partnership got even stronger, as we just announced , with the promise of even better serverless options, expanded use of AWS Graviton instances to improve performance at lower cost, and improved hybrid options through AWS Outposts. Beyond product, we'll also be more closely collaborating to reach and educate customers through joint Developer Relations initiatives, programs to reach new customers, and more. As good as our partnership has been, it just got significantly better. Although focusing on how the two companies compete may be convenient (for example, both organizations provide database services), how we cooperate is a more compelling story. So let's talk about that. A mutual obsession Over the past 15 years, MongoDB has built an extensive partner ecosystem around our application data platform. From open source mainstays like Confluent, to application development innovators like Vercel, data intelligence pioneers like BigID, and trusted system integration powerhouses like Accenture, we work closely with the best partners to ensure developers enjoy an exceptional experience working with MongoDB. As already noted, AWS is the partner with which we've worked most closely for the longest time. That partnership has resulted in tight integration between MongoDB and AWS services such as AWS Wavelength, Amazon Kinesis Data Firehose, Amazon EventBridge, AWS PrivateLink, AWS App Runner, Amazon Managed Grafana, and more. We also recently announced Pay as You Go Atlas on AWS Marketplace , giving customers even more options for how they run MongoDB on AWS. Additionally, as part of our new strategic agreement, we'll be offering joint customer incentive programs to make it even easier for customers to run proofs of concept and migrate from expensive legacy data infrastructure to MongoDB Atlas running on AWS. If this seems to paint an overly rosy picture of our partnership with AWS, it's worth remembering that the guiding principle for both AWS and MongoDB is customer obsession. Of course we've had moments when we've disagreed over how best to take care of customers, because every partnership has its fair share of friction. But behind the scenes, our product, marketing, and sales teams have worked together for years to meet customer needs. Customers seem to recognize this. In MongoDB's most recent earnings call, we announced that we now have more than 33,000 customers — including Shutterfly , Cox Automotive , Pitney Bowes , and Nesto Software — many of which choose to run Atlas on AWS. Still not convinced? There's perhaps no better way to understand what MongoDB can do for your organization than to try it. You can try Atlas for free , or you can choose to pay-as-you-go by starting with Atlas on the AWS Marketplace . Either way, we hope you'll let us know what you think.

March 15, 2022
Updates

Understanding the MongoDB Stable API and Rapid Release Cadence

MongoDB provides the world’s leading application data platform, and we strive to make it as easy as possible for developers to build and evolve their applications. In MongoDB 5.0, we made two important updates to the way we release database versions and make them available to customers: One was the creation of the Stable API and the other was our new quarterly MongoDB Rapid Release cadence. Now that we have a few Rapid Releases under our belt (visit our blog to learn about MongoDB 5.1 and 5.2 ), we wanted to provide an update on the API and the process for choosing between the Major and Rapid Release tracks. The MongoDB Stable API The Stable API was created to make it easier for customers to upgrade to the latest MongoDB version without worrying about introducing breaking changes to their code base. It includes a subset of MongoDB commands that applications commonly use, and MongoDB ensures those commands remain consistent when we release new database versions. That effectively decouples the application lifecycle from the database lifecycle. Providing this level of consistency is especially important for helping customers consume our innovations faster and take advantage of MongoDB’s new release cadence. It was previously known as the Versioned API, but we changed the name to Stable API to avoid potential confusion. From our conversations with users and customers, it became clear that the previous name gave the impression that the API would change with each incremental MongoDB version release. That is not the case, so we say “hello” to the MongoDB Stable API. The MongoDB Rapid Release Cadence MongoDB Atlas customers with clusters on a Dedicated Tier (M10+) can opt in to Rapid Releases to get the latest features from MongoDB on a quarterly basis. Atlas customers are initially on the Major Release track; Major Releases happen annually and contain the previous year’s Rapid Releases by default. Customers who choose the Major Release track will have the following upgrade flow: 5.0 -> 6.0 -> 7.0, etc., and can schedule when they want to upgrade to each new Major Release after it enters general availability. Customers who opt in to the Rapid Release track will have the following upgrade flow: 5.0 -> 5.1 -> 5.2 -> 5.3 -> 6.0 -> 6.1 -> 6.2, etc. If you are on a Major Release and decide to change tracks, then you will automatically go to the next Rapid Release. (If you are on 6.0 and 6.2 is the latest Rapid Release, you can jump directly from 6.0 to 6.2 without having to upgrade to 6.1 first.) Customers on the Major Release track will still receive regular patch upgrades. Users on the Rapid Release track who later decide to opt out will need to do so at the time of the next Major Release. If you’re MongoDB 5.2 and want to change back to the Major Release track, for example, you will wait to leave the Rapid Release track until the next Major Release, MongoDB 6.0, is available. As another example, at the time of publication, the latest Major Release is 5.0 and the latest Rapid Release is 5.2. A customer on MongoDB 4.4 (an earlier Major Release prior to the new release cadence and numbering scheme ) would need to manually upgrade from 4.4 to 5.0 before opting in to Rapid Releases and getting MongoDB 5.2. To opt in to the Rapid Release cadence, choose the “Latest Release” option in the Atlas web UI. Rapid Releases are only supported for MongoDB Atlas. For on-premises environments, they should be used only for development builds and testing and not for production environments. Apart from MongoDB Atlas Dedicated Tier clusters, Atlas supports Shared Tier clusters M0, M2, and M5 — which provide 512MB, 2GB, and 5GB of storage, respectively — as well as managed serverless instances , which are currently in public preview. Shared Tier clusters are always on the Major Release track, and serverless instances are on the Rapid Release track. With options to get Major or Rapid Releases in MongoDB Atlas and to use the Stable API for consistency across versions, MongoDB customers have more flexibility than ever to choose how to take advantage of the latest MongoDB database upgrades. Stay tuned for the latest innovations from MongoDB in the 5.3 release this spring, and join us at MongoDB World this summer to learn about MongoDB 6.0 and more!

March 10, 2022
Updates

Speed Up Your Workflow with Query Library in Atlas Charts

We're excited to announce Query Library for Atlas Charts! Getting started with Charts is already fast, easy, and powerful. With Query Library, we have made it even easier to build charts with queries. When you log in to Charts, there are a few essential steps to visualize your data. You need to add a data source, you need to create a dashboard, and from there you can create a chart. The Charts UI provides a user-friendly, drag and drop interface for building charts. But today, more than a quarter of users also leverage the MongoDB Query Language (MQL) to write custom queries when creating charts. To demonstrate a simple example of what using a query looks like, we’ll use the sample movie data we make available to every Charts user through our sample dashboard . Below we are using MQL to filter for only movies in the comedy genre: Rather than dragging the genre field into this chart and adding a filter, with a little bit of MQL knowledge, a query can speed up the chart-building workflow. As you can see above, users can now also easily save this newly created query or load a previously saved query. Query Library builds on Charts’ existing support for queries and aggregation pipelines and makes it even more powerful to leverage MQL in building charts. Rather than recreating queries across multiple dashboards, manually sharing with team members, copying and pasting, or otherwise retrieving queries written in the past, Charts users can either save any new query for later use or load a saved query directly from the chart builder. Here’s what it looks like to load a saved query: Best of all, these saved queries are available across your team. Any saved query is available to all members of your project. Check out our documentation for more details on saving, loading, and managing queries in Charts. Simplifying visualization of your Atlas data The goal of Atlas Charts is to create a data visualization experience native to MongoDB Atlas customers. It’s a quick, straightforward, and powerful tool to help you make business decisions and perform analytics on your applications. Capabilities like Query Library will help to speed up your data visualization workflow to get you quickly in and out of your data and back to what matters for your team. To get started with Query Library today, navigate to the chart builder in any of your dashboards, simply write a query, and save it for later use! New to Atlas Charts? Get started today by logging into or signing up for MongoDB Atlas , deploying or selecting a cluster, and activating Charts for free.

March 2, 2022
Updates

MongoDB Connector for Apache Kafka 1.7 Available Now

Today, MongoDB has released version 1.7 of the MongoDB Connector for Apache Kafka! This article highlights some of the key features of this new release! MongoDB errors to the Dead Letter Queue Apache Kafka version 2.6 added support for handling errant records. The MongoDB Kafka Connector for Apache Kafka automatically sends messages that it cannot process to the dead letter queue. This includes messages that fail during conversion but up until this release did not include errors that were generated within MongoDB. For example, consider the scenario where we have a topic, “Sales.OrderStaging” This topic includes messages that contain an ‘order-id’ field. The application needs to insert a new document into MongoDB and use that order-id as the primary key or ‘_id’ of the document. If there happens to be a duplicate order-id entered on the kafka topic, the kafka message should be routed to a dead letter queue topic and the mongodb connector should continue to process other orders. the following sink configuration highlights the configuration parameters that support this scenario: "errors.tolerance":"all", "mongo.errors.tolerance":"all", "mongo.errors.log.enable":"true", "errors.log.include.messages":"true", "errors.deadletterqueue.topic.name":"orders.deadletterqueue", "errors.deadletterqueue.context.headers.enable":"true", "writemodel.strategy":"com.mongodb.kafka.connect.sink.writemodel.strategy.InsertOneDefaultStrategy", "document.id.strategy":"com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy", "document.id.strategy.overwrite.existing":"true", "document.id.strategy.partial.value.projection.type": "AllowList", "document.id.strategy.partial.value.projection.list": "order-id" For example consider a kafka message with an order-id=5 and another message with the same order-id of 5. The sink connector will try to insert that second message with the same _id and there will be a MongoDB Error generated as expected. The kafka topic message which caused the error will be written to the orders.deadletterqueue topic. Once on the dead letter queue, you can inspect the errant records, update them, and resubmit them for processing. Setting the errors.deadletterqueue.context.headers.enable to true will add metadata to the DLQ message. This extra information may help with any automatic processing of errors in the queue. In addition to the DLQ, you can set errors.log.enable and error.log.include.messages configuration to write errors in the kafka connect log. Here is an example error from our scenario above: com.mongodb.kafka.connect.sink.dlq.WriteException: v=1, code=11000, message=E11000 duplicate key error collection: Sales.Orders index: id dup key: { _id: { order-id: 5 } }, details={} Bulk write improvements Today the connector sink process works with a bulk insert in an ordered fashion. For example, consider these 10 documents in a bulk operation: [1,2,3,4,5,6,7,8,9,10] If document number 5 failed, perhaps due to a duplicate _id error, the MongoDB driver would return this error back to the connector and the rest of the documents would not be written to MongoDB e.g. only [1,2,3,4] is written in MongoDB in the above example. While this might be acceptable for some use cases, for other scenarios with large batch sizes this can make reprocessing messages cumbersome. In Kafka 1.7, we introduced a new parameter bulk.write.ordered that by default is set to true which is the behavior as it exists today with the Kafka connector. Setting to false and running the above scenario will result in an end state of [1,2,3,4,6,7,8,9,10] written to MongoDB with 5 being written to the topic defined in the dead letter queue. Note that the actual order of the documents may be different since we specified false to bulk.write.ordered. For more information on error handling including information on the format of the DLQ headers check out the MongoDB Kafka Connector documentation . To setup a dead letter queue, check out the Creating a dead letter queue section within the Confluent Kafka Connect documentation. Changed retry logic Currently the Kafka Connector manages retries of the writes to MongoDB using the max.num.retries and retries.defer.timeout configuration properties . This feature was originally intended to address challenges such as network connection issues. Since that time the MongoDB drivers have implemented native capabilities that handle retry logic: The Kafka Connector uses the MongoDB Java driver and has retries enabled by default so there are no changes or extra configuration you need to do to enable retry in the Kafka Connector. Note: If you set retryWrites to false in the connection.uri configuration property, then retries are disabled for the sink connector. If you would like to leverage the drivers native retry capability, simply remove the “retryWrites”' parameter from the connection.uri. Allow disk use when copying The copy.existing.allow.disk.use configuration copies existing data from the source. It uses an aggregation pipeline that filters change stream events coming from the MongoDB source. In certain situations this pipeline can use up large amounts of memory. This flag enabled by default allows the copy existing aggregation to use temporary disk storage if required by the query. Note the default is true, but set to false if the process running MongoDB doesn't have the permissions for disk access. For more information see the ‘allowDiskuse’ option in the aggregate() documentation .

February 17, 2022
Updates

Scale Out Without Fear or Friction: Live Resharding in MongoDB

Live resharding was one of the key enhancements delivered in our MongoDB 5.0 Major Release . With live resharding you can change the shard key for your collection on demand as your application evolves with no database downtime or complex data migrations . In this blog post, we will be covering: Product developments that have made sharding more flexible What you had to do before MongoDB 5.0 to reshard your collection, and how that changed with 5.0 live resharding Guidance on the performance and operational considerations of using live resharding Before that, we should discuss why you should shard at all, and the importance of selecting a good shard key – even though you have the flexibility with live resharding to change it at any time. Go ahead and skip the next couple of sections if you are already familiar with sharding! Why Shard your Database? Sharding enables you to distribute your data across multiple nodes. You do that to: Scale out horizontally — accommodate growing data or application load by sharding once your application starts to get close to the capacity limits of a single replica set. Enforce data locality — for example pinning data to shards that are provisioned in specific regions so that the database delivers low latency local access and maintains data sovereignty for regulatory compliance. Sharding is the best way of scaling databases and MongoDB was developed to support sharding natively. Sharding MongoDB is transparent to your applications and it’s elastic so you can add and remove shards at any time. The Importance of Selecting a Good Shard Key MongoDB’s native sharding has always been highly flexible — you can select any field or combination of fields in your documents to shard on. This means you can select a shard key that is best suited to your application’s requirements. The choice of shard key is important as it defines how data is distributed across the available shards. Ideally you want to select a shard key that: Gives you low latency and high throughput reads and writes by matching data distribution to your application’s data access patterns. Evenly distributes data across the cluster so you avoid any one shard taking most of the load (i.e., a “hot shard”). Provides linear scalability as you add more shards in the future. While you have the flexibility to select any field(s) of your documents as your shard key, it was previously difficult to change the shard key later on. This made some developers fearful of sharding. If you chose a shard key that doesn’t work well, or if application requirements change and the shard key doesn’t work well for its changed access patterns, the impact on performance could be significant. At this point in time, no other mainstream distributed database allows users to change shard keys, but we wanted to give users this ability. Making Shard Keys More Flexible Over the past few releases, MongoDB engineers have been working to provide more sharding flexibility to users: MongoDB 4.2 introduced the ability to modify a shard key’s value . Under the covers the modification process uses a distributed, multi-document ACID transaction to change the placement of a document in a sharded cluster. This is useful when you want to rehome a document to a different geographic region or age data out to a slower storage tier . MongoDB 4.4 went further with the ability to refine the shard key for a collection by adding a suffix to an existing key. Both of these enhancements made sharding more flexible, but they didn’t help if you needed to reshard your collection using an entirely different shard key. Manual Resharding: Before MongoDB 5.0 Resharding a collection was a manual and complex process that could only be achieved through one of two approaches: Dumping the entire collection and then reloading it into a new collection with the new shard key . This is an offline process, and so your application is down until data reloading is complete — for example, it could take several days to dump and reload a 10 TB+ collection on a three-shard cluster. Undergoing a custom migration that involved writing all the data from the old cluster to a new cluster with the resharded collection. You had to write the query routing and migration logic, and then constantly check the migration progress to ensure all data had been successfully migrated. Custom migrations entail less downtime, but they come with a lot of overhead. They are highly complex, labor-intensive, risky, and expensive (as you had to run two clusters side-by-side). It took one MongoDB user three months to complete the live migration of 10 billion documents. How this Changed with MongoDB 5.0: Live Resharding We made manual resharding a thing of the past with MongoDB 5.0. With 5.0 you just run the reshardCollection command from the shell, point at the database and collection you want to reshard, specify the new shard key, and let MongoDB take care of the rest. reshardCollection: "<database>.<collection>", key: <shardkey> When you invoke the reshardCollection command, MongoDB clones your existing collection into a new collection with the new shard key, then starts applying all new oplog updates from the existing collection to the new collection. This enables the database to keep pace with incoming application writes. When all oplog updates have been applied, MongoDB will automatically cut over to the new collection and remove the old collection in the background. Lets walk through an example where live resharding would really help a user: The user has an orders collection. In the past, they needed to scale out and chose the order_id field as the shard key. Now they realize that they have to regularly query each customer’s orders to quickly display order history. This query does not use the order_id field. To return the results for such a query, all shards need to provide data for the query. This is called a scatter-gather query. It would have been more performant and scalable to have orders for each customer localized to a shard, avoiding scatter-gather, cross-shard queries. They realize that the optimal shard key would be "customer_id: 1, order_id: 1" rather than just the order_id . With MongoDB 5.0’s live resharding, the user can just run the reshard command, and MongoDB will reshard the orders collection for them using the new shard key, without having to bring the database and the application down. Watch our short Live Resharding talk from MongoDB.Live 2021 to see a demo with this exact example. Not only can you change the field(s) for a shard key, you can also review your sharding strategy, changing between range, hash, and zones. Live Resharding: Performance and Operational Considerations Even with the flexibility that live resharding gives you, it is still important to properly evaluate the selection of your shard key. Our documentation provides guidance to help you make the best choice of shard key . Of course, live resharding makes it much easier to change that key should your original choice have not been optimal, or if your application changes in a way that you hadn’t previously anticipated. If you find yourself in this situation, it is essential to plan for live resharding. What do you need to be thinking about before resharding Make sure you have sufficient storage capacity available on each node of your cluster. Since MongoDB is temporarily cloning your existing collection, spare storage capacity needs to be at least 1.2x the size of the collection you are going to reshard. This is because we need 20% more storage in order to buffer writes that occur during the resharding process. For example, if the size of the collection you want to reshard is 2 TB compressed, you should have at least 2.4 TB of free storage in the cluster before starting the resharding operation. While the resharding process is efficient, it will still consume additional compute and I/O resources. You should therefore make sure you are not consistently running the database at or close to peak system utilization. If you see CPU usage in excess of 80% or I/O usage above 50%, you should scale up your cluster to larger instance sizes before resharding. Once resharding is done, it's fine to scale back down to regular instance sizes. Before you run resharding, you should update any queries that reference the existing shard key to include both the current shard key and the new shard key. When resharding is complete, you can remove the old shard key from your queries. Review the resharding requirements documentation for a full run down on the key factors to consider before resharding your collection. What should you expect during resharding? Total duration of the resharding process is dependent on the number of shards, the size of your collection, and the write load to your collection. For a constant data size, the more shards the shorter the resharding duration. From a simple POC on MongoDB Atlas, a 100 GB collection took just 2 hours 45 minutes to reshard on a 4-shard cluster and 5 hours 30 minutes on a 2-shard cluster. The process scales up and down linearly with data size and number of shards – so a 1 TB collection will take 10 times longer to reshard than a 100GB collection. Of course your mileage may vary based on the read/write ratio of your application along with the speed and quality of your underlying hardware infrastructure. While resharding is in flight, you should expect the following impacts to application performance: The latency and throughput of reads against the collection that is being resharded will be unaffected . Even though we are writing to the existing collection and then applying oplog entries to both its replicas and to the cloned collection, you should expect to see negligible impact to write latency given enough spare CPU. If your cluster is CPU-bound, expect a latency increase of 5 to 10% during the cloning phase and 20 to 50% during the applying phase (*) . As long as you meet the aforementioned capacity requirements, the latency and throughput of operations to other collections in the database won't be impacted . (*) Note: If you notice unacceptable write latencies to your collection, we recommend you stop resharding, increase your shard instance sizes, and then run resharding again. The abort and cleanup of the cloned collection are instantaneous. If your application has time periods with less traffic, reshard your collection during that time if possible. All of your existing isolation, consistency, and durability guarantees are honored while resharding is running. The process itself is resilient and crash-safe, so if any shard undergoes a replica set election, there is no impact to resharding – it will simply resume when the new primary has been elected. You can monitor the resharding progress with the $currentOp pipeline stage. It will report an estimate of the remaining time to complete the resharding operation. You can also abort the resharding process at any time. What happens after resharding is complete? When resharding is done and the two collections are in sync, MongoDB will automatically cut over to the new collection and remove the old collection for you, reclaiming your storage and returning latency back to normal. By default, cutover takes up to two seconds — during which time the collection will not accept writes, and so your application will see a short spike in write latency. Any writes that timeout are automatically retried by our drivers , so exceptions are not surfaced to your users. The cutover interval is tunable: Resharding will be quicker if you raise the interval above the two second default, with the trade-off that the period of write unavailability will be longer. By dialing it down below two seconds, the interval of write unavailability will be shorter. However, the resharding process will take longer to complete, and the odds of the window ever being short enough to cutover will be diminished. You can block writes early to force resharding to complete by issuing the commitReshardCollection command. This is useful if the current time estimate to complete the resharding operation is an acceptable duration for your collection to block writes. What you Get with Live Resharding Live sharding is available wherever you run MongoDB – whether that’s in our fully managed Atlas application data platform in the cloud , with Enterprise Advanced , or if using the Community Edition of MongoDB. To recap how you benefit from live resharding: Evolve with your apps with simplicity and resilience: As your applications evolve or as you need to improve on the original choice of shard key, a single command kicks off resharding. This process is automated, resilient, and non-disruptive to your application. Compress weeks/months to minutes/hours: Live resharding is fully automated, so you eliminate disruptive and lengthy manual data migrations. To make scaling out even easier, you can evaluate the effectiveness of different shard keys in dev/test environments before committing your choice to production. Even then, you can change your shard key when you want to. Extend flexibility and agility across every layer of your application stack: You have seen how MongoDB’s flexible document data model instantly adapts as you add new features to your app. With live resharding you get that same flexibility when you shard. New features or new requirements? Simply reshard as and when you need to. Summary Live Resharding is a huge step forward in the state of distributed systems, and is just the start of an exciting and fast-paced MongoDB roadmap that will make sharding even easier, more flexible, and automated. If you want to dig deeper, please take a look at the Live Resharding session recording from our developer conference and review the resharding documentation . To learn more about MongoDB 5.0 and our new Rapid Releases, download our guide to what’s new in MongoDB .

January 26, 2022
Updates

Introducing MongoDB Realm’s Flexible Sync – Now Available in Preview

Twelve months ago, we made MongoDB’s edge-to-cloud data synchronization service, Realm Sync , generally available. Since then, Sync has helped hundreds of our customers build reliable, offline-first mobile apps that serve data to millions of end users – from leading telematics providers to chart-topping consumer apps . Historically, Realm Sync has worked well for apps where data is compartmentalized and permissions rarely change, but dynamic use cases with evolving permissions required workarounds. We knew we could do more, so today we are excited to announce the next iteration of Realm Sync – Flexible Sync. With the introduction of Flexible Sync, we are redefining the sync experience by enabling even the most complex use cases out-of-the-box without requiring any custom code. Intuitive query-based sync Distinctly different from how Realm Sync operates today, Flexible Sync lets you use language-native queries to define the data synced to user applications. This more closely mirrors how you are used to building applications today – using GET requests with query parameters – making it easy to learn and fast to build to MVP. Flexible Sync also supports dynamic, overlapping queries based on user inputs. Picture a retail app that allows users to search available inventory. As users define inputs – show all jeans that are size 8 and less than $40 – the query parameters can be combined with logical ANDs and ORs to produce increasingly complex queries, and narrow down the search result even further. In the same application, employees can quickly limit inventory results to only their store’s stock, pulling from the same set of documents as the customer, without worrying about overlap. Document-level permissions Whether it’s a company’s internal application or an app on the App Store, permissions are required in almost every application. That’s why we are excited by how seamless Flexible Sync makes applying a document-level permission model when syncing data – meaning synced documents can be limited based on a user’s role. Consider how an emergency room team would use their hospital’s application. A resident should only be able to access her patients’ charts while her fellow needs to be able to see the entire care team’s charts. In Flexible Sync, a user’s role will be combined with the client-side query to determine the appropriate result set. For example, when the resident above filters to view all patient charts the permission system will automatically limit the results to only her patients. Real-time collaboration optimizations Flexible Sync also enhances query performance and optimizes for real-time collaboration by treating a single object or document as the smallest entity for synchronization. This means synced data is shared between client devices more efficiently and conflict resolution incorporates changes faster and with less data transfer than before. Getting started Flexible Sync is available now. Simply sign up or log in to your cloud account, deploy a Realm app, select your sync type, and dive right in. Flexible Sync is compatible with MongoDB 5.0, which is available with dedicated Atlas database clusters (M10 and higher). Shared-tier cluster support for 5.0 and Flexible Sync will be made available mid-February. Have questions? Check out our documentation or the more detailed announcement post on the Developer Hub. Looking ahead Our goal with Flexible Sync is to deliver a sync service that can fit any use case or schema design pattern imaginable without custom code or workarounds. And while we are excited that Flexible Sync is now in preview, we’re nowhere near done. The Realm Sync team is planning to bring you more query operators, permissions integrations, and enhancements over the course of 2022. We look to you, our users, to help us drive the roadmap. Submit your ideas and feature requests to our feedback portal and ask questions in our Community forums . Happy building!

January 24, 2022
Updates

Ready to get Started with MongoDB Atlas?

Start Free