What is WiredTiger? What is MMAPv1?
WiredTiger is MongoDB’s new storage engine. It is available as an option for all 64-bit MongoDB 3.0 and higher builds. Among other features, it supports document-level locking and compression on disk. Check out MongoDB’s docs on WiredTiger for more information.
MMAPv1 is MongoDB’s traditional memory-mapped files storage engine. In MongoDB 3.0 we added collection-level locking while remaining compatible with the on-disk storage of MongoDB 2.6 and older.
Why you should match
Matching your currently running storage engine to your backed-up engine is critical to easy restores. If your restore files are in a different format than what you are used to running, you will have to make certain that you set your command line options correctly, and differently from what you are running elsewhere.
Where to change your setting
First, you have to be running the new MMS. MMS Classic customers cannot change their backup format, and only have MMAPv1 backups available to them. You are running the new MMS if the upper-left of your MMS window looks like this:
Once you are in the new MMS, you can change your backup format by going to the “Backup” tab, click the “…” for your replica set or cluster and choose “Edit Storage Engine”
Once there, you can choose MMAPv1 (“MongoDB Memory Mapped Files”) or WiredTiger:
Once you make this change, an initial sync will be triggered. Choose the server you want to sync from and confirm. An initial sync is required so we can build your new backup. This will not change your existing snapshot formats, so if you request an older restore, it will still be in MMAPv1. You can tell if a snapshot is in WiredTiger format by looking at the “Mongod Version” column on your snapshots listing page (just click on a replica set name on your Backup tab). If the version has “(wiredTiger)” after it, the snapshot is in WiredTiger format. You can see I converted this replica set to WiredTiger:
MongoDB and Leap Seconds
The short answer As the June 30, 2015 leap second event approaches, I have received a number of questions about how MongoDB is expected to behave during a leap second event. The short answer is “just fine.” MongoDB treats the observation of leap seconds similarly to the observation of clock skew between machines or the observation of other time-setting events, like manual clock adjustment. In more detail To understand why MongoDB is robust to leap seconds, it helps to think about how leap seconds affect the observation of wall clock time, especially the case where it can make it appear to processes that time has gone backwards, and about how MongoDB uses wall clock time. Leap seconds come in one of two forms: either an extra second added at the end of the last minute of a specific calendar day or the omission of the last second at the end of the last minute of a specific calendar day in UTC. So, this can lead to a time 23:59:60Z on a day with a leap second in the first case, or to time transitioning from 23:59:58Z to 00:00:00Z on a day with a leap second in the second case. Unfortunately, the time standard used by almost all computers defines a calendar day as being composed of 86,400 seconds. Two techniques are used to deal with this discrepancy. The cool but by far less common one is to make all the computer-reported seconds for a period of time leading up to the end of the leap-second day slightly longer or shorter than true seconds, “smearing” the leap second over several hours. Google apparently does this . The more mundane technique is for the OS clock to have the last second occur two times, from the point of view of observing processes, or to skip the last second, depending on the type of leap second. When the last second of the day occurs twice, an observer reading time with subsecond granularity could observe 23:59:59.800Z and subsequently observe 23:59:59:200Z, making it seem as though time has moved backwards. When the last second of the day is omitted, a process might believe that two seconds have passed when in fact only one has, because it observes 23:59:58Z and then 00:00:00Z. With this information about the observable effects of leap seconds in hand, we can now look at how this might affect MongoDB’s use of wall clock time. MongoDB uses wall clock time for the following: To generate diagnostic information, such as log messages; To record the wall clock time in fields of documents via the $currentDate update operator and related operators, and to generate OIDs; To generate “optime” fields in replication oplogs; To schedule periodic events, such as replication heartbeats or cursor expirations. Impact on Diagnostic Information Diagnostic data is used by human beings and tools such as MMS Monitoring to monitor the health of a MongoDB cluster, or to perform a forensic analysis after an observed failure. In these cases, the accuracy of the reported wall clock time aids in diagnosis, but is not required for correct operation of the cluster or for the analytic task. This must be so, because MongoDB clusters are distributed over asynchronous networks, and tight synchronization of clocks among the components of the system cannot be assured. One caveat in the forensics and monitoring use case is that, if your operating system might allow MongoDB to observe time moving backwards , some diagnostic log messages may indicate that an operation took a very long time when it in fact did not. These false positives for slow operations are typically easy to identify because they report absurdly long or negative durations (frequently on the order of two weeks, positive or negative). This can also occur if you manually reset your system clock during MongoDB operation. Impact on $currentDate et al When a client application requests a document be updated with the server’s notion of the current date and time, MongoDB simply asks the operating system for the current wall clock time and records that value in a client document. Any impact of clock adjustments for leap seconds or otherwise will effectively be passed through to the client application. Applications that require second-granularity precision of timestamps should be examined, whether or not they use MongoDB, as should the time synchronization technology used in support of that application (typically NTP). Impact on the replica set oplog MongoDB replica sets use a replicated operation log, or oplog, to inform secondary nodes of changes to make in order to stay consistent with the primary node. These changes are kept in a total order, described by an “optime”, sometimes called the timestamp. This optime is composed of wall clock time paired with an " increment ", an integer which uniquely identifies operations that execute during the same wall clock time. For example, the first operation recorded at 23:59:59Z would be recorded as optime (23:59:59Z,1) and the third operation would have optime (23:59:59Z,3). But wall clock time is not used indiscriminately, because system clocks can drift, or be reset. The time portion of the optime is actually the maximum of the current observed time and the greatest previous observation. If MongoDB records operation A with an optime of (23:59:59Z,1), and then observes a time of 23:59:58Z when it attempts to log a subsequent operation B, it will act as if operation B occurred during 23:59:59Z, and thus log it with an optime of (23:59:59Z,2).In addition to leap seconds, unsynchronized clocks between replica set members may cause the optime to be ahead of any one node’s local wall clock time. This situation is common and does not negatively affect replication operation. Impact on the scheduling of periodic tasks The final way that MongoDB uses wall clock time is to schedule periodic activities, like sending heartbeats to replica set nodes, cleaning up expired cursors or invalidating caches that use age-based invalidation policies. These activities are typically scheduled to run after some amount of wall clock time has elapsed, rather than at specific absolute wall clock times; the difference is not material. In either event, the introduction of a positive leap second may cause an event to occur later than it otherwise would have, and the introduction of a negative leap second may cause an event to occur sooner than it otherwise would have. MongoDB’s algorithms must already be robust to these behaviors, because they are typically indistinguishable from delays induced by higher-than-average network latency or virtual machine and operating system scheduling issues. Your Operating System matters Remember, MongoDB relies on host operating system capabilities for reading the wall clock time, and for synchronizing events with wall clock time. As such, you should ensure that the operating system running under MongoDB is itself prepared for leap seconds. The most widely documented database problems during the June 2012 leap second were actually caused by a livelock bug in the Linux kernel futex synchronization primitive. The DataStax developer blog has a brief summary of the cause of the June 2012 issue in Cassandra, which correctly assigns responsibility to a since-resolved issue in the Linux kernel. If you use Red Hat Enterprise Linux, they have a nice knowledge base article that covers the topic of leap second preparedness for RHEL. If you’re running on Windows, Microsoft has a very brief knowledge base article on the subject of leap seconds. If you’re interested in learning more about the operational best practices of MongoDB, download our guide: Learn Best Practices for Operations About the Author - Andy Andy Schwerin is the Director of Distributed Systems Engineering at MongoDB in New York.
MACH Aligned for Retail (Microservices, API-First, Cloud Native SaaS, Headless)
Across the Retail industry, MACH principles and the Mach Alliance are becoming increasingly common. What is MACH and why is it being embraced for Retail? The MACH Alliance is a non-profit organization fostering the adoption of composable architecture principles. It stands for Microservices, API-First, Cloud-Native SaaS and Headless. The MACH Alliance’s Manifesto is to: “Future proof enterprise technology and propel current and future digital experiences" The MACH Alliance and the creation of this set of principles originated in the Retail Industry. Several of the 5 co-founders of the MACH Alliance are technology companies building for retail use cases: for example commercetools is a composable commerce platform for retail (built completely on MongoDB). MongoDB has been a member of the MACH Alliance since 2020, as an “enabler” member, meaning use of our technology can enable the implementation of the MACH principles in application architectures. This is because a data layer built on MongoDB is ideal as the basis for a MACH architecture. Members of our Industry Solutions team sit on the MACH technology, growth and marketing councils, and actively are involved with furthering the adoption of MACH across the Retail Industry What is MACH, why is it important for retail? The retail industry has long been a fast adopter of technology and a forerunner in technology trends. This is because of the competitive nature of the business leading a drive towards innovation- its vital that retails are able to react quickly to new technologies (e.g. NFTs, VR, AI) to capture market share and stay ahead of the competitors. Retailers have realized that to be able to deliver new and value-add experiences to their customers, they have to cut back on operational overhead that leads to increased cost and build standard functionality that can either be bought or re-used. This is where the benefits of MACH comes in- it's all about increasing the ability to deliver innovation quickly while lowering operational costs & risk. Microservices: An approach to building applications in which business functions are broken down into smaller, self-contained components called services. These services function autonomously and are usually developed and deployed independently. This means the failure or outage of one microservice will not affect another and teams can develop in parallel, increasing efficiency. API-First: A style of development where the sharing and use of the data via API (application programming interface) is considered first and foremost in the development process. This means that services are designed to aid the easy sharing of information across the organization and simple interconnectivity of systems. Cloud-Native SaaS: Cloud-native SaaS solutions are vendor-managed applications developed in and for the cloud, and leveraging all the capabilities the cloud has to offer, such as fully managed hosting, built-in security, auto-scaling, cross-regional deployment and automatic updates. These are a good fit for a MACH architecture as adopting them can reduce operational costs and frees up developers for value-add work like new unique customer experiences. Headless: Decoupling the front end from the back-end so that front ends (or “heads”) can be created or iterated on with no dependencies on the back end. The fact that the layers are loosely coupled decreases time to market for new front ends, and encourages the re-use back-end services for multiple purposes. It also de-risks change in the long term as services can function independently. Where does MongoDB come in? MongoDB is an enabler for MACH, meaning that using MongoDB as your data layer helps retailers and retail software companies. achieve MACH compliance. Our data model, architecture and functionality empower IT organizations to build in line with these architecture principles. During a digital transformation, where a retailer is modernizing a monolith into a microservices based architecture, they're looking for a data layer which will enable speed of development & change. MongoDB is the "most wanted" database 4 years running on Stack Overflow's developer survey- this is because our document model maps to the way developers are thinking & coding, and the flexibility allows for iterative change of the data layer. When looking at API based communication, the standard format for APIs is JSON, which again maps to MongoDB's document model. The idea with API-first development is to develop with the API in mind- why not store the data the way you're going to serve it by API. This reduces complexity and increases performance. Cloud Native and SaaS products have become the norm as retailers wish to reduce maintenance and management work. MongoDB Atlas, provides a database-as-a-service, guaranteeing 99.995% uptime, automatic failover and self-healing and allowing DevOps engineers to spin up databases in minutes or by API/ script. Many retail software companies are also built on MongoDB Atlas- for example commercetools, which provides an ecommerce solution as a SaaS product. Headless architectures require a data layer that is able to adapt and change for new workloads. The ability to change the schema at runtime, with no downtime, makes MongoDB's document model ideal for this. Performance and the ability to scale for new "heads" is also important. MongoDB is known as a high performance database and can scale vertically automatically or scale out horizontally seamlessly. So MongoDB becomes a great choice for retailers choosing to adopt a MACH architecture (see figure 1 below). As a general purpose database with high performance, a rich expressive query language and secondary indexing, MongoDB is a really good fit as a data layer as it is capable of handling operational and analytical needs of the application. FIgure 1: Example of a MACH architecture Want to know more? Are you interested in a transition to MACH? Dive into our four part blog series exploring each topic in detail and how MongoDB supports each of these principles: Microservices API-First Cloud-Native SaaS Headless