What Storage Engine Should my Backups Use?
What is WiredTiger? What is MMAPv1?
WiredTiger is MongoDB’s new storage engine. It is available as an option for all 64-bit MongoDB 3.0 and higher builds. Among other features, it supports document-level locking and compression on disk. Check out MongoDB’s docs on WiredTiger for more information.
MMAPv1 is MongoDB’s traditional memory-mapped files storage engine. In MongoDB 3.0 we added collection-level locking while remaining compatible with the on-disk storage of MongoDB 2.6 and older.
Why you should match
Matching your currently running storage engine to your backed-up engine is critical to easy restores. If your restore files are in a different format than what you are used to running, you will have to make certain that you set your command line options correctly, and differently from what you are running elsewhere.
Where to change your setting
First, you have to be running the new MMS. MMS Classic customers cannot change their backup format, and only have MMAPv1 backups available to them. You are running the new MMS if the upper-left of your MMS window looks like this:
Once you are in the new MMS, you can change your backup format by going to the “Backup” tab, click the “…” for your replica set or cluster and choose “Edit Storage Engine”
Once there, you can choose MMAPv1 (“MongoDB Memory Mapped Files”) or WiredTiger:
Once you make this change, an initial sync will be triggered. Choose the server you want to sync from and confirm. An initial sync is required so we can build your new backup. This will not change your existing snapshot formats, so if you request an older restore, it will still be in MMAPv1. You can tell if a snapshot is in WiredTiger format by looking at the “Mongod Version” column on your snapshots listing page (just click on a replica set name on your Backup tab). If the version has “(wiredTiger)” after it, the snapshot is in WiredTiger format. You can see I converted this replica set to WiredTiger:
MongoDB and Leap Seconds
The short answer As the June 30, 2015 leap second event approaches, I have received a number of questions about how MongoDB is expected to behave during a leap second event. The short answer is “just fine.” MongoDB treats the observation of leap seconds similarly to the observation of clock skew between machines or the observation of other time-setting events, like manual clock adjustment. In more detail To understand why MongoDB is robust to leap seconds, it helps to think about how leap seconds affect the observation of wall clock time, especially the case where it can make it appear to processes that time has gone backwards, and about how MongoDB uses wall clock time. Leap seconds come in one of two forms: either an extra second added at the end of the last minute of a specific calendar day or the omission of the last second at the end of the last minute of a specific calendar day in UTC. So, this can lead to a time 23:59:60Z on a day with a leap second in the first case, or to time transitioning from 23:59:58Z to 00:00:00Z on a day with a leap second in the second case. Unfortunately, the time standard used by almost all computers defines a calendar day as being composed of 86,400 seconds. Two techniques are used to deal with this discrepancy. The cool but by far less common one is to make all the computer-reported seconds for a period of time leading up to the end of the leap-second day slightly longer or shorter than true seconds, “smearing” the leap second over several hours. Google apparently does this . The more mundane technique is for the OS clock to have the last second occur two times, from the point of view of observing processes, or to skip the last second, depending on the type of leap second. When the last second of the day occurs twice, an observer reading time with subsecond granularity could observe 23:59:59.800Z and subsequently observe 23:59:59:200Z, making it seem as though time has moved backwards. When the last second of the day is omitted, a process might believe that two seconds have passed when in fact only one has, because it observes 23:59:58Z and then 00:00:00Z. With this information about the observable effects of leap seconds in hand, we can now look at how this might affect MongoDB’s use of wall clock time. MongoDB uses wall clock time for the following: To generate diagnostic information, such as log messages; To record the wall clock time in fields of documents via the $currentDate update operator and related operators, and to generate OIDs; To generate “optime” fields in replication oplogs; To schedule periodic events, such as replication heartbeats or cursor expirations. Impact on Diagnostic Information Diagnostic data is used by human beings and tools such as MMS Monitoring to monitor the health of a MongoDB cluster, or to perform a forensic analysis after an observed failure. In these cases, the accuracy of the reported wall clock time aids in diagnosis, but is not required for correct operation of the cluster or for the analytic task. This must be so, because MongoDB clusters are distributed over asynchronous networks, and tight synchronization of clocks among the components of the system cannot be assured. One caveat in the forensics and monitoring use case is that, if your operating system might allow MongoDB to observe time moving backwards , some diagnostic log messages may indicate that an operation took a very long time when it in fact did not. These false positives for slow operations are typically easy to identify because they report absurdly long or negative durations (frequently on the order of two weeks, positive or negative). This can also occur if you manually reset your system clock during MongoDB operation. Impact on $currentDate et al When a client application requests a document be updated with the server’s notion of the current date and time, MongoDB simply asks the operating system for the current wall clock time and records that value in a client document. Any impact of clock adjustments for leap seconds or otherwise will effectively be passed through to the client application. Applications that require second-granularity precision of timestamps should be examined, whether or not they use MongoDB, as should the time synchronization technology used in support of that application (typically NTP). Impact on the replica set oplog MongoDB replica sets use a replicated operation log, or oplog, to inform secondary nodes of changes to make in order to stay consistent with the primary node. These changes are kept in a total order, described by an “optime”, sometimes called the timestamp. This optime is composed of wall clock time paired with an " increment ", an integer which uniquely identifies operations that execute during the same wall clock time. For example, the first operation recorded at 23:59:59Z would be recorded as optime (23:59:59Z,1) and the third operation would have optime (23:59:59Z,3). But wall clock time is not used indiscriminately, because system clocks can drift, or be reset. The time portion of the optime is actually the maximum of the current observed time and the greatest previous observation. If MongoDB records operation A with an optime of (23:59:59Z,1), and then observes a time of 23:59:58Z when it attempts to log a subsequent operation B, it will act as if operation B occurred during 23:59:59Z, and thus log it with an optime of (23:59:59Z,2).In addition to leap seconds, unsynchronized clocks between replica set members may cause the optime to be ahead of any one node’s local wall clock time. This situation is common and does not negatively affect replication operation. Impact on the scheduling of periodic tasks The final way that MongoDB uses wall clock time is to schedule periodic activities, like sending heartbeats to replica set nodes, cleaning up expired cursors or invalidating caches that use age-based invalidation policies. These activities are typically scheduled to run after some amount of wall clock time has elapsed, rather than at specific absolute wall clock times; the difference is not material. In either event, the introduction of a positive leap second may cause an event to occur later than it otherwise would have, and the introduction of a negative leap second may cause an event to occur sooner than it otherwise would have. MongoDB’s algorithms must already be robust to these behaviors, because they are typically indistinguishable from delays induced by higher-than-average network latency or virtual machine and operating system scheduling issues. Your Operating System matters Remember, MongoDB relies on host operating system capabilities for reading the wall clock time, and for synchronizing events with wall clock time. As such, you should ensure that the operating system running under MongoDB is itself prepared for leap seconds. The most widely documented database problems during the June 2012 leap second were actually caused by a livelock bug in the Linux kernel futex synchronization primitive. The DataStax developer blog has a brief summary of the cause of the June 2012 issue in Cassandra, which correctly assigns responsibility to a since-resolved issue in the Linux kernel. If you use Red Hat Enterprise Linux, they have a nice knowledge base article that covers the topic of leap second preparedness for RHEL. If you’re running on Windows, Microsoft has a very brief knowledge base article on the subject of leap seconds. If you’re interested in learning more about the operational best practices of MongoDB, download our guide: Learn Best Practices for Operations About the Author - Andy Andy Schwerin is the Director of Distributed Systems Engineering at MongoDB in New York.
Using MongoDB Skill Scanner to Build Better Training Programs
Technology leaders know that transformation is about more than just adopting modern technologies like MongoDB. The entire organization has to rally behind change — which is no easy task. The skills that modern development teams need are evolving faster than ever, and hiring to fill skills gaps can be too time-consuming and expensive of a process for many organizations. So it’s imperative that we plan for how we want to bring our people with us on our modernization journey, and proactively upskill them on the technologies we’re betting on. Because what happens if you choose MongoDB, but your developers don’t know how to use it? CIOs know that training programs are easier said than done. EY reported that 30% of CIOs acknowledge that their training programs are ineffective, and that they’re struggling to retain talent because of it. These leaders come to us to help them build and execute their MongoDB training programs , and seek advice on two extremely common yet critical challenges: How do we get away from the less effective one-size-fits-all approach? How do we measure the ROI of our training program and connect it to business impact? How we use MongoDB Skill Scanner to overcome training challenges Our Professional Services team uses a tool called MongoDB Skill Scanner to address both of these challenges. This tool helps us provide these three benefits to our customers looking to build a training program: Improve MongoDB proficiency: Teams can use Skill Scanner to quickly and easily assess the MongoDB skill gaps of their team members and gain a comprehensive understanding of their team’s MongoDB skills baseline. Increased productivity and accuracy: When team members have a comprehensive understanding of MongoDB, they are able to work more quickly and accurately on projects, leading to increased productivity and a higher quality of work. Save time and money with targeted Training: Using Skill Scanner, customers can avoid wasting time and money on trial-and-error learning. Instead, they can focus on improving their skills in a more targeted and efficient way with right-sized training plans. By leveraging this data, our customers’ engineers can engage in the right training at the right time, targeted for their job role and specific skill shortages. When a training program is built this way, engineers maximize their knowledge retention and minimize time away from their projects. Skill Scanner includes three role-based assessments, one for developers, database administrators, and DevOps respectively. Through a series of multiple choice questions, Skill Scanner provides customers with a clear understanding of their level of expertise across a set of technical skills that are critical for success in their role. After submitting the assessment, engineers will get results in each skill area outlining if they are beginner, intermediate, or advanced. Why data-driven training programs matter We’ve learned that it’s not enough to just tell teams to go watch training videos or webinars on their own, or to place everyone in the same one-size-fits-all program. Skills gaps vary from team to team, and individual to individual. The one-size-fits-all approach of some programs may not address individual learners' needs, wasting time and making it difficult for them to acquire new skills. By using Skill Scanner, we’re able to interpret this data to help determine which training courses your team should take. But we don’t only capture this data before doing training; we use Skill Scanner again after training programs are completed to see where immediate improvements have been made. This helps technology leaders prove the impact and ROI of their training, and gives them the confidence that their teams are ready to be successful with MongoDB. Developing a Precision Learning Program To go even further, our team can work with you to build a Precision Learning Program, where we use Skill Scanner data to build learning schedules that are unique to each individual. These schedules include a variety of short, blended, learning events such as classes, technical workshops, self-paced exercises, and project coaching. We’ve seen PLP lead to higher knowledge retention and of course, measurable project results. A customer who recently concluded their PLP saw a 43% increase in knowledge retention. Getting started building a personalized training program Skill gaps aren’t a novel problem IT leaders are facing. But with new digital courses, training, and technologies, the resources to close these gaps are at your fingertips. Skill Scanner and Precision Learning Program have been specifically designed to empower teams by offering targeted training that enhances their understanding of MongoDB. These short training events are carefully crafted to close skill gaps without compromising developer productivity. We’ve seen a variety of customers use this tool to help train their team’s individual needs, from needing to upskill new hires on their teams, projects with new MongoDB products, migrating to MongoDB Atlas, and more. It also saves your business the hours developers would've wasted searching for answers (and developers don’t want to spend their time that way, either). “We need help getting from point A to point B and feel MongoDB is uniquely positioned to help” — CTO at large insurance firm If you're interested in trying out MongoDB Skill Scanner or want to explore the MongoDB Precision Learning Program further, you can reach out to your account representative or contact us directly .