Introducing the MongoDB 5.2 Rapid Release

Chirag Shah

MongoDB allows you to address a wide variety of data workloads using a single API. Our latest rapid release — MongoDB 5.2 — builds upon this vision with improvements to query ergonomics, enhancements to time series collections (introduced in MongoDB 5.0), scaling, operational resilience, and new capabilities that allow teams to execute more sophisticated analytics in-place.

Columnar Compression for Time Series Collections

Introduced in MongoDB 5.0, time series collections allow you to easily ingest and work with time series data alongside your operational or transactional data, without the need to integrate a separate single-purpose database into your environment. The 5.2 Rapid Release introduces columnar compression for time series collections in MongoDB.

Time series use cases — whether it’s device monitoring, trendspotting, or forecasting — require that new data be inserted into the database for every measurement. In cases where data is being continuously created, the sheer amount of data can be staggering, making it difficult to manage an ever growing storage footprint.

To help teams achieve high performance while maintaining resource efficiency, we’ve introduced a few capabilities to time series collections.

  • New columnar compression for time series collections will help teams dramatically reduce their database storage footprint by as much as 70% with best-in-class compression algorithms such as delta, delta-of-delta encoding, simple-8b, run-length encoding, and more.

  • For teams using MongoDB Atlas, Atlas Online Archive support for time series collections (introduced with the 5.1 Rapid Release) allows them to define archiving policies to automatically move aged data out of the database and into lower-cost, fully managed cloud object storage.

Better Query Ergonomics and Point in Time Queries for Operational Analytics

More efficient queries make developers’ lives easier. With the MongoDB 5.2 Rapid Release, we are introducing new operators and enhancements that will increase productivity, query performance and reduce the number of queries needed to unlock insights. This also allows teams to push more work down to the database, reducing the amount of code developers need to write and maintain while limiting the amount of data that has to be pushed back and manipulated in applications.

New accumulators & expression to sort arrays

MongoDB 5.2 brings new operators that streamline your queries. The $top and $bottom operators allow you to compute the top and bottom elements of a data set and return related fields within the same query without complex logic. For example, let’s say that you were analyzing sales performance and wanted the top salesperson for every region, including their sales. These new operators can help you retrieve the results in a single dataset, including any additional fields from the original dataset.

{$group: {
_id: "$region",
 person: {
     $top: {
         output: ["$name", "$sales"],
         sortBy: {"sales": -1}
       }
   }
}}


Result:
{
  {_id:’amer’, person: [‘Daniel LaRousse’, 100000]},
  {_id:’emea’, person: [‘John Snow’, 1]},
  {_id:’latam’, person: [‘Frida Kahlo’, 99]}
}

We are also introducing $maxN, $minN, and accumulators such as $firstN, $lastN, which return elements while taking into account the current order of documents in a dataset.

A highly requested feature, the new $sortArray expression allows you to sort the elements in an array directly in your aggregation pipeline in an intuitive, optimized way. The input array can be as simple as an array of scalars or as complex as an array of documents with embedded subdocuments. Let’s say you had previously sorted product reviews based on timestamp but now want to sort based on user rating. You can now easily do this using the $sortArray operator to change the sorting criteria with no additional code required.

Sorting an array of integers

$sortArray: {
	input: [3, 1, 4, 1, 5, 9],
	sortBy: 1
}
 
Result: [1, 1, 3, 4, 5, 9]

Sorting arrays of documents

{
 "team": [
   {
     "name": "kyle",
     "age": 28,
     "address": { "street": "12 Baker St", "city": "London" }
   },
   {
     "name": "bill",
     "age": 42,
     "address": { "street": "12 Blaker St", "city": "Boston" }
   }
 ]

A simple sort: "name" ascending

{$project: {
	_id: 0,
	result: {
		$sortArray: {
			input: "$team",
			sortBy: {name: 1}
		}
	}
}
 
Output: {
 "result": [
   {
     "name": "bill",
     "age": 42,
     "address": { "street": "12 Blaker St", "city": "Boston" }
   },
   {
     "name": "kyle",
     "age": 28,
     "address": { "street": "12 Baker St", "city": "London" }
   }
 ]
}

Long-running snapshot queries now generally available

Your applications can now execute complex analytical queries against a globally and transactionally consistent snapshot of your live, operational data. Even as data changes beneath you, MongoDB preserves point-in-time consistency of the query results returned to your users without you having to implement complex reconciliation controls back in your code. The default for long-running snapshot queries in MongoDB Atlas is 5 minutes but can be changed with the help of our support team.

Queries can span multiple shards, unlocking analytics against large, distributed data sets. By routing long-running queries to secondaries, you can isolate analytics from transactional queries with both workloads served by the same cluster, avoiding slow, complex, and expensive ETL to data warehouses.

Query results can be returned directly to the application or cached in a materialized view, providing your users with low latency access to deep analytics. Typical uses include end-of-day reconciliation and reporting, along with ad-hoc data exploration and mining. All of these use-cases can now be served directly from your transactional data layer, dramatically simplifying the data infrastructure you need to serve multiple classes of workloads.

Improving Resilience with Faster Initial Sync via File Copy

Initial sync is how a replica set member in MongoDB loads a full copy of data from an existing member. This process occurs when users are adding new nodes to replica sets to improve resilience, or to reduce read latency or improve read scalability with secondary reads. Initial sync is also commonly used to recover replica set members that have fallen too far behind the other members in a cluster. Prior to 5.2, logical initial sync was the only option available for performing an initial sync. With logical initial sync, every collection in the source node is scanned and all documents are then inserted into matching collections in the target node (with indexes being built at the time of document insertion). However, users and customers leveraging logical initial sync, especially those trying to synchronize large data sizes, have reported frustratingly long initial sync times.

Starting with the 5.2 Rapid Release, we have added the option of initial sync via file copy to significantly improve the performance of initial syncs. With this method, MongoDB will copy files from the file system of the source node to the file system of the target node. This process can be faster than a logical initial sync, especially at larger data sizes. In our testing with a 630 GB dataset, initial sync via file copy was nearly four times (4X) faster than a logical initial sync on the same dataset. This new capability builds upon the continuous enhancements we’ve made to improve resilience and scalability, including the ability for initial sync to automatically resume after a network failure, and allowing users to specify their preferred initial sync source – both introduced with MongoDB 4.4.

For more information, see the documentation on initial sync.

Enhanced Developer Experience with MongoDB Analyzer for .NET

And finally, we’re pleased to announce the release of the MongoDB Analyzer for .NET, which enables C# developers to more easily troubleshoot queries and aggregations, and prevent errors from cropping up at runtime. The MongoDB Analyzer builds on earlier releases of the MongoDB .NET driver. It makes it easier and faster for developers to use MongoDB with C#, including a fully redesigned LINQ interface.

Previously, C# developers were able to interact with MongoDB idiomatically using Builders or LINQ expressions, but there was no easy way to see before running their code if those mapped correctly to the MongoDB Query API. Downloadable as a NuGet package, the MongoDB Analyzer allows developers to easily see if their queries and aggregations correspond to expressions supported by the Query API. By surfacing unsupported expressions during code development, the MongoDB Analyzer ultimately improves developer productivity and reduces the pain of debugging.

Getting Started with MongoDB 5.2

MongoDB 5.2 is available now. If you are running Atlas Serverless instances or have opted in to receive Rapid Releases in your dedicated Atlas cluster, then your deployment will be automatically updated to 5.2 starting today. For a short period after upgrade, the Feature Compatibility Version (FCV) will be set to 5.1; certain 5.2 features will not be available until we increment the FCV. MongoDB 5.2 is also available as a Development Release for evaluation purposes only from the MongoDB Download Center. Consistent with our new release cadence announced last year, the functionality available in 5.2 and the subsequent Rapid Releases will all roll up into MongoDB 6.0, our next Major Release scheduled for delivery later this year.

Safe Harbour Statement

The development, release, and timing of any features or functionality described for our products remains at our sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality.