MongoDB 4.2 Performance Boosts

Dj Walker-Morgan

Every performance improvement counts to make your applications run more smoothly and give your users a better experience. You are probably wondering what has changed in MongoDB 4.2 with respect to performance. I was too, so I sat down with engineering to talk about some of the speed ups you'll find.

Small updates, big speedup

If you have large documents, around 10MB and above, it was found that small updates on those documents could make the server perform unnecessary memory management and reduce overall performance. Engineering diagnosed the problem and created a fix which improved performance of these particular operations by up by two orders of magnitude. The fix was so good, we also back ported it to MongoDB 4.0.7 and 3.6.11.

Accelerated Aggregation

Aggregation is a massively powerful feature in MongoDB, and a huge space for engineering's constant optimization. With 4.2, one such optimization was oddly to not use aggregation. The server has now gotten smart enough to detect when small aggregations such as a simple match such as

[{$project: {excluded: 0}}, {$match: {predicate: true}}]

could more efficiently be performed by the already heavily optimized query system. In that situation, it transparently turns the small aggregation into a fast query that doesn't need to power up a pipeline, whilst leaving the bigger aggregations able to make full, optimal use of the fast streaming pipelines in the aggregation framework.

The art of optimization is in avoiding doing unnecessary work and remembering what you've done. For example, one speed up in 4.2 leverages the fact that a pipeline's content may have already been sorted when it comes to grouping the content up and pulling the first entry of each group out. Previously, the server would perform a full index scan, but now, it leverages the sorted nature of the data and group to skip through the data.

Other optimizations make better use index data, in a similar way to covered queries, when aggregating and handle sparse indexes in the pipeline more efficiently.

Array Up

Large unsorted arrays and the $in operation were another target for optimization in 4.2. By their nature they are tricky to handle efficiently – if you can sort your arrays, all the better for performance – but engineering knew they could get better performance out of them. Approaching the challenge on a number of different fronts, the solution showed four to five times better performance in testing with ten thousand element arrays and around twenty to thirty percent improvement with one thousand element arrays. You'll still get better performance from sorted arrays, so if you can sort them, do. This is another case where the improvement is so good, we couldn't just have it in 4.2 - the optimization was backported to MongoDB 3.2.21, 3.4.16, 3.6.6, 4.0.1.

Optimizing Index Builds

One new feature that was all about entirely about optimization from the start was the new Optimized Index Builds. The goal of this new feature was to deliver index building with the performance of 4.0's foreground index build with the non-blocking behavior and low performance impact of 4.0's background index builds. It was a goal the engineering team more than met.

The ramifications of this enhancement don't just affect how indexes are built. By reducing the resource cost of indexing, the changes also meant the cost of inserting new entries into indexes was also lowered. That in turn improved performance testing, with reduced times when loading up indexes, and quicker indexing of inserted documents.

We'll be talking Optimized Index Builds and take a dive into making the most of them in a future article on the blog.

The only slow down

The performance team did find some slow downs, or more precisely, one slow down of any note. The testing process revealed that in a particular case which affects a collection with unique indexes other than the _id index. In that case, bulk insertions of new documents with already unique values for the unique index fields could be slowed down by up to 22%. The overhead has been identified as the search for duplicates which returns nothing. The impact of this regression scales with the number of unique indexes on the collection.

That this was the only slow down of note in the 4.2 release shows how well the MongoDB 4.2 development went with the team monitoring the entire development cycle with an extensive test suite, testing for any and all regressions (and unexpected speedups). All but the previously mentioned bulk insert/unique index regressions found during this process were fixed before the final release of 4.2.

Wrapping up

We're constantly optimizing MongoDB to give you the best possible performance and MongoDB 4.2 is no different. New features and new tuneups mean this your database will be running better than ever with 4.2.