Here are some great articles about MongoDB to read this weekend:
ScaleGrid: Should You Enable MongoDB Journaling?, 10/18
Business Insider: MongoDB co-founder Dwight Merriman and CEO Max Schireson were chosen for the Silicon Alley Top 100 in New York Tech roundup, October
InfoWorld: Use MongoDB to Make Your App Location-Aware, 10/24
comSysto: Getting Started with MongoSoup, 10/25
Digital Misinformation: Collections and Embedded Documents in MongoDB, 10/22
Performance Tuning MongoDB on Solidfire
This is a guest post by by Chris Merz & Garrett Clark, SolidFire We recently had a large enterprise customer implement a MongoDB sharded cluster on SolidFire as the backend for a global e-commerce system. By leveraging solid-state drive technology with features like storage virtualization, Quality of Service (guaranteed IOPS per volume), and horizontal scaling, the customer was looking to combine the benefits of dedicated storage performance with the simplicity and scalability of a MongoDB environment. During the implementation the customer reached out to us with some performance and tuning questions, requesting our assistance with the configuration. After meeting with the team and reviewing the available statistics, we discovered response times that were deemed out of range for the application’s performance requirements. Response times were ~13-20ms (with an average of 15-17 ms). While this is considered acceptable latency in many implementations, the team was targeting < 5ms average query response times. When troubleshooting any storage latency performance issue it is important to focus on two critical aspects of the end-to-end chain: potential i/o queue depth bottlenecks and the main contributors to the overall latency in the chain. A typical end-to-end sequence with attached storage can be described by: MongoDB > OS > NIC > Network > Storage > Network > NIC > OS > MongoDB First off, we looked for any i/o queue depth bottlenecks and found the first one on the operating system layer. MongoDB was periodically sending an i/o queue depth of >100 to the operating system and, by default, iSCSI could only release a 32 queue depth per iSCSI session. This drop from an i/o queue depth of >100 to 32 caused frames to be stalled on the operating system layer while they were waiting to continue down the chain. We alleviated the issue by increasing the number of iSCSI sessions to the volume from 1 to 4, which proportionally increased the queue depth exiting the operating system to 128 (32*4). This enabled all frames coming off the application layer to immediately pass through the operating system and NIC, decreased the overall latency from ~15ms to ~4ms. Despite the latency average being 4ms, performance was still rather variable. We then turned our focus to pinpointing the sources of the remaining end-to-end latency. We were able to determine the latency factors in the stack through the study of three latency loops: First, the complete chain of: MongoDB > OS > NIC > Network > Storage > Network > NIC > OS > MongoDB . This loop took an average of 3.9ms to complete. Secondly, the subset loop of: OS > NIC > Network > Storage > Network > NIC > OS . This loop took ~1.1ms to complete. We determined the latency of this loop by the output of “iostat –xk 1” then greping for the corresponding volume. The last loop segment, latency on the storage subsystem, was 0.7ms and was obtained through a polling API command issued to the SolidFire unit. Our analysis pointed to the first layers of the stack contributing the most significant percent (>70%) of the end-to-end latency, so we decided to start there and continue downstream. We reviewed the OS configuration and tuning, with an eye towards both SolidFire/iSCSI best practices and MongoDB performance. Several OS-level tunables were found that could be tweaked to ensure optimal throughput for this type of deployment. Unfortunately, none of these resulted in any major reduction in the end-to-end latency for mongo. Having eliminated the obvious, we were left with what remained: MongoDB itself. A phrase oft-quoted by the famous fictional detective, Sherlock Holmes came to mind: “when you have eliminated the impossible, whatever remains, however improbable , must be the truth.” Upon going over the collected statistics runs with a fine-toothed comb, we noticed that the latency spikes had intervals of almost exactly 60 seconds. That’s when the light bulb went off… The MongoDB flush interval. The architecture of MongoDB was developed in the context of spinning disk, a vastly slower storage technology requiring batched file syncs to minimize query latency. The syncdelay setting defaults to 60 seconds for this very reason. In the documentation , it is clearly stated “In almost every situation you should not set this value and use the default setting”. ‘Almost’ was the key to our solution, in this particular case. It should be noted that changing syncdelay is an advanced tuning, and should be carefully evaluated and tested on a per-deployment basis. Little’s Law (IOPS = Queue Depth / Latency) indicated that lowering the flush interval would reduce the variance in queue depth thereby smoothing the overall latency. In lab testing, we had found that, under maximum load, decreasing the syncdelay to 1 second would force a ‘continuous flush’ behavior usually repeating every 6-7 seconds, reducing i/o spikes in the storage path. We had seen this as a useful technique for controlling IOPS throughput variability, but had not typically viewed it as a latency reduction technique. It worked! After implementing the change, the customer excitedly reported that they were seeing average end-to-end MongoDB response times of 1.2ms, with a throughput of ~4-5k IOPS per mongod (normal for this application), and NO obvious increase in extraneous i/o. By increasing the number of iSCSI sessions, normalizing the flush rate and removing the artificial 60s buffer, we reduced average latency more than an order of magnitude, proving out the architecture at scale in a global production environment. Increasing the iSCSI sessions increased parallelism, and decreased the latency by 3.5-4x. The reduction in syncdelay had the effect of smoothing the average queue depth being sent to the storage system, decreasing latency by slightly more than 3x. This customer’s experience is a good example of how engaging the MongoDB team early on can ensure a smooth product launch. As of today, we’re excited to announce that SolidFire is a MongoDB partner. Learn more about the SolidFire and MongoDB integration on our Database Solutions page . To learn more about performance tuning MongoDB on SolidFire, register for our upcoming webinar on November 6 with MongoDB. For more information on MongoDB performance, check out Performance Considerations for MongoDB , which covers other topics such as indexes and application patterns and offers insight into achieving MongoDB performance at scale.
What the C-Suite Should Know About Data Strategy for 2023
Trying to predict the future is obviously fraught with difficulty. Anything can happen. Just look at the past few years, where it seemed like everything and anything did happen. With us now in the second month of 2023 and the rest of the year shaping up to be one of potentially big changes and disruptions, the only clear indicator of what’s to come is what we’ve seen trending in the months, weeks, and days preceding this new year. So, with that said, here are five things the C-suite is likely to see more of as 2023 progresses. And what it all means for building a resilient, enduring, and innovative data strategy. 1. Software may still be eating the world, but developers are eating all the work Almost 12 years ago, Marc Andreessen proclaimed, “software is eating the world.” And while that sentiment still holds true today, the biggest beneficiaries of software’s global appetites will continue to be developers. In an interview with The Cube at last year’s AWS re:Invent, MongoDB CEO Dev Ittycheria put it this way: “It’s almost a cliche to say now that software is eating the world. Because every company’s value proposition is driven by software. But what that really means is developers are eating all the work.” One of the best examples of developers “eating all the work” is DevOps. At the advent of DevOps, we saw software development teams incorporate the previously separate domain of IT operations into their work, while turning infrastructure into a programmable interface and creating a continuous feedback loop that improved developer agility. But DevOps was just the start. We’re now seeing developers embedding other previously separate domains into their work, such as security, data science, and data analytics (more on that below). The business implications of embedding these previously disparate domains into software development are quite huge. It means rapid innovation, faster time-to-market, better fraud detection and prevention, A/B testing — the list goes on and on. With software continuing to eat the world, developers are continuing to eat all the work while also taking massive bites out of silos. 2. Builder teams will require less and less complexity With software development teams taking on more work, we’re also going to see the need to reduce complexity. Particularly when it comes to bolt-on solutions. Search is a good example here. For a lot of teams out there, database operations and search have traditionally been two separate systems that are then glued together. Which doesn’t usually decrease complexity. In fact, the opposite happens. Such as having to manage dependencies across systems. But when teams have access to a single, unified, and fully-managed platform that integrates the database, search engine, and sync mechanism, you remove the need for glue and the complexity goes way down. As SVP of products at MongoDB Andrew Davidson said on a recent episode of The Cloudcast : “...Search as a bolt-on [and] entirely different system… has such a profoundly inconsistent experience that if you can bring it in to have near consistency in line with the database, that's a game changer…” And with development teams taking on more and more work previously associated with separate domains, like analytics (described above), they’re needing to use other systems that have also been traditionally glued together. So the question facing many organizations this year and beyond will be: Why spend time moving data between separate glued-together solutions for things like search, visualization, and analytics, when a single data platform can handle it all? 3. Apps are going to get a lot smarter If you were to go back 15 years to 2008 — which, wow, can’t believe that was 15 years ago, but anyway… — you’d notice just how radically the technology landscape has really changed. Cloud computing wasn’t quite yet a thing back then. And mobile was really just getting off the ground. Today, an equally sizable shift is happening. In an interview with SiliconAngle this past November, MongoDB CEO Dev Ittycheria said: “I believe the next big platform shift is moving from dumb apps to smart apps that incorporate machine learning, AI, and very sophisticated automation.” As mentioned previously, development teams are taking on more work associated with previously separate domains. This is also happening with data analytics, which previously lived outside the application development process. But now analytics is “ shifting left ” directly into app development. The results for businesses are: the ability for applications to process and analyze real-time data much, much faster and at a lower cost, and to both understand trends and make more informed predictions based on those trends. The results for customers are greater personalization and richer digital experiences. Building smarter applications is the future. But how quickly and effectively organizations do that is still dependent on their data platforms . Not all can bring analytics into app development in the same ways. In this respect, the future may be smarter applications; but for different businesses — to paraphrase author William Gibson — that future isn’t evenly distributed. Yet. Encryption, encryption, [$a&*9Qd] Encryption will not only continue to be critical for how organizations store their data, it will also revolutionize how data is used in the application development process. Ask a lot of software veterans about data encryption and they’re likely to tell you how important it is. They’ll also likely say that encryption, particularly in-use encryption, can have scalability issues and/or complexity problems . But in 2023 and beyond, new advancements will make those issues a thing of the past. With new technologies, like Queryable Encryption , the ability to build smarter applications that use end-to-end encrypted data can move at the speed that development teams and businesses require. The added benefit is that this increases end-user trust. As MongoDB’s chief information security officer Lena Smart said in an interview with SiliconAngle in December 2022 : “By giving people things like Queryable Encryption, for example, you’re going to free up a whole bunch of headspace. [Their customers] don’t have to worry about their data being … harvested from memory or harvested while at rest or in motion.” The name of the game in 2023 will be 8QTwZm* *encrypted for demonstration purposes. 5. Bottom line: Your data strategy is your business strategy When we get to the end of December 2023, we’ll probably look back on the intervening months between now and then and see a lot of stuff we didn’t expect. What we do know is that data is going to play an increasingly important role in how businesses operate. Why do we know this? Well, because this has been a trend in each and every year since organizations first started using data to build better software and richer digital experiences. Software might be eating the world, and developers might be eating the work, but data is eating business. So in 2023, it’s incumbent for business leaders to set the table accordingly. To get started building your data strategy with MongoDB, get in touch with our experts .