Using the MongoDB Management Service, you can now upgrade your MongoDB deployment safely and quickly from 2.6.9 to 3.0.1 via the user interface. To upgrade your own deployment, follow the tutorial below. As always, if you should run into any trouble, you can reach out to MongoDB via the MMS Support page.
Step 0. Considerations Before Upgrading
- MongoDB 3.0.1 introduces significant changes to the authorization model. If you are using authSchemaVersion 1 (the schema used in MongoDB 2.4 and earlier), you must upgrade to authSchemaVersion 3 before proceeding to upgrade to 3.0.1. If using Automation, you can do the schema upgrade by clicking the wrench for your deployment. If you haven’t imported for Automation yet, here are the manual instructions.
- You cannot upgrade directly from 2.4 to 3.0, you must upgrade to 2.6 first.
- Driver compatibility: As long as you leave the authSchemaVersion on 3, and are still on MMAPv1, your old driver should continue to work. However, if you upgrade the authSchemaVersion to 5, or upgrade to wiredTiger, a MongoDB 3.0 compatible driver will be necessary. Many third-party tools will not work after you upgrade to authSchemaVersion 5. Check for MongoDB 3.0 compatible versions of your utilities.
This is not a guide to upgrading to WiredTiger, but this is a tutorial for using Automation to change your storage engine to WiredTiger.
Step 1. Review Your Current Deployment
On the deployment page, you should see that your current replica set has recent pings indicated by the green light to the right of each member and have no startup warnings. I’ve set up a sharded cluster here, but the process is the same.
If each server does not have an automation agent on it, you will need to download the automation agent from the Administration -> Agents page, install it, and make sure that you have imported your deployment for automation. If one of the hosts is unreachable, you will need to resolve this issue as well before being able to upgrade the replica set.
Step 2. Select the deployment and modify.
Click the version drop-down and select 3.0.1 and then click “Apply”. Review and deploy your changes. Now the Automation Agents will download the proper MongoDB version and do the upgrade in the proper order for you.
If you do not see 3.0.1 in your list of MongoDB versions, then it is not enabled in your Version Manager. You can enable it by going to “Deployment” and then selecting “Version Manager”. Add the versions you wish to upgrade to and click “Review & Deploy” and then “Confirm & Deploy”.
Step 3: Enjoy your updated deployment
If you are ready with 3.0-Compatible drivers and utilities, you can now change the AuthSchema to 5 via the modifications sidebar from Step 2, above.
Announcing PyMongo 3.0
PyMongo 3.0 is a partial rewrite of the Python driver for MongoDB. More than six years after the first release of the driver, this is the biggest release in PyMongo’s history. Bernie Hackett, Luke Lovett, Anna Herlihy, and I are proud of its many improvements and eager for you to try it out. I will shoehorn the major improvements into four shoes: conformance, responsiveness, robustness, and modernity. Conformance The motivation for PyMongo’s overhaul is to supersede or remove its many idiosyncratic APIs. We want you to have a clean interface that is easy to learn and, more importantly, closely matches the interfaces of our other drivers. CRUD API Mainly, “conformance” means we have implemented the same interface for create, read, update, and delete operations as the other drivers have, as standardized in the Driver CRUD API . My colleague Jeremy Mikola will explain the full spec in a separate article soon, but here is the gist. The familiar old methods work the same in PyMongo 3, but they are deprecated: save() insert() update() remove() find_and_modify() These methods were vaguely named. For example, “update” updates or replaces some or all matching documents depending on its arguments. The arguments to “save” and “remove” are likewise finicky, and the many options for “find_and_modify” are intimidating. Other MongoDB drivers do not have exactly the same arguments in the same order for all these methods. If you or other developers on your team are using a driver from a different language, it makes life a lot easier to have consistent interfaces. The new CRUD API names its methods like “update_one”, “insert_many”, “find_one_and_delete”: they say what they mean and mean what they say. Even better, all MongoDB drivers have exactly the same methods with the same arguments. See the spec for details . One Client Class In the past we had three client classes: Connection for any one server, and ReplicaSetConnection to connect to a replica set. We also had a MasterSlaveConnection that could distribute reads to slaves in a master-slave set. In November 2012 we created new classes, MongoClient and MongoReplicaSetClient, with better default settings, so now PyMongo had five clients. Even more confusingly, MongoClient could connect to a set of mongos servers and do hot failover. As I wrote on my blog , the fine distinctions between the client classes baffled users. And the set of clients we provided did not conform with other drivers. But since PyMongo is among the most-used of all Python libraries we waited long, and thought hard, before making major changes. The day has come. MongoClient is now the one and only client class for a single server, a set of mongoses, or a replica set. It includes the functionality that had been split into MongoReplicaSetClient: it can connect to a replica set, discover all its members, and monitor the set for stepdowns, elections, and reconfigs. MongoClient now also supports the full ReadPreference API . MongoReplicaSetClient lives on for a time, for compatibility’s sake, but new code should use MongoClient exclusively. The obsolete Connection, ReplicaSetConnection, and MasterSlaveConnection are removed. The options you pass to MongoClient in the URI or as arguments now completely control the client’s behavior: >>> # Connect to one standalone, mongos, or replica set member. >>> client = MongoClient(‘mongodb://server’) >>> >>> # Connect to a replica set. >>> client = MongoClient( ... ‘mongodb://member1,member2/?replicaSet=my_rs’) >>> >>> # Load-balance among mongoses. >>> client = MongoClient(‘mongodb://mongos1,mongos2’) This is exciting because PyMongo applications are now so easy to deploy: your code simply loads a MongoDB URI from an environment variable or config file and passes it to a MongoClient. Code and configuration are cleanly separated. You can move smoothly from your laptop to a test server to the cloud, simply by changing the URI. Non-Conforming Features PyMongo 2 had some quirky features it did not share with other drivers. For one, we had a “copy_database” method that only one other driver had, and which almost no one used. It was hard to maintain and we believe you want us to focus on the features you use, so we removed it. A more pernicious misfeature was the “start_request” method. It bound a thread to a socket, which hurt performance without actually guaranteeing monotonic write consistency. It was overwhelmingly misused, too: new PyMongo users naturally called “start_request” before starting a request, but in truth the feature had nothing to do with its name. For the history and details, including some entertaining (in retrospect) tales of Python threadlocal bugs, see my article on the removal of start_request . Finally, the Python team rewrote our distributed-systems internals to conform to the new standards we have specified for all our drivers. But if you are a Python programmer you may care only a little that the new code conforms to a spec; it is more interesting to you that the new code is responsive and robust. Responsiveness PyMongo 3’s MongoClient can connect to a single server, a replica set, or a set of mongoses. It finds servers and reacts to changing conditions according to the Server Discovery And Monitoring spec , and it chooses which server to use for each operation according to the Server Selection Spec . David Golden and I explained these specs in general in the linked articles, but I can describe PyMongo’s implementation here. Replica Set Discovery And Monitoring In PyMongo 2, MongoReplicaSetClient used a single background thread to monitor all replica set members in series. So a slow or unresponsive member could block the thread for some time before the thread moved on to discover information about the other members, like their network latencies or which member is primary. If your application was waiting for that information—say, to write to the new primary after an election—these delays caused unneeded seconds of downtime. When PyMongo 3’s new MongoClient connects to a replica set it starts one thread per mongod server. The threads fan out to connect to all members of the set in parallel, and they start additional threads as they discover more members. As soon as any thread discovers the primary, your application is unblocked, even while the monitor threads collect more information about the set. This new design improves PyMongo’s response time tremendously. If some members are slow or down, or you have many members in your set, PyMongo’s discovery is still just as fast. I will explain and demonstrate the new design in my MongoDB World talk, “ Drivers And High Availability: Deep Dive. ” Mongos Load-Balancing Our multi-mongos behavior is improved, too. A MongoClient can connect to a set of mongos servers: >>> # Two mongoses. >>> client = MongoClient(‘mongodb://mongos1,mongos2 The behavior in PyMongo 2 was “high availability”: the client connected to the lowest-latency mongos in the list, and used it until a network error prompted it to re-evaluate their latencies and reconnect to one of them. If the driver chose unwisely at first, it stayed pinned to a higher-latency mongos for some time. In PyMongo 3, the background threads monitor the client’s network latency to all the mongoses continuously, and the client distributes operations evenly among those with the lowest latency. See " mongos Load Balancing " for more information. Throughput Besides PyMongo’s improved responsiveness to changing conditions in your deployment, its throughput is better too. We have written a faster and more memory efficient pure python BSON module , which is particularly important for PyPy, and made substantial optimizations in our C extensions. Robustness Disconnected Startup The first change you may notice is, MongoClient’s constructor no longer blocks while connecting. It does not raise ConnectionFailure if it cannot connect: >>> client = MongoClient(‘mongodb://no-host.com’) >>> client MongoClient('no-host.com', 27017) The constructor returns immediately and launches the connection process on background threads. Of course, foreground operations might time out: >>> client.db.collection.find_one() AutoReconnect: No servers found yet Meanwhile, the client’s background threads keep trying to reach the server. This is a big win for web applications that use PyMongo—in a crisis, your app servers might be restarted while your MongoDB servers are unreachable. Your applications should not throw an exception at startup, when they construct the client object. In PyMongo 3 the client can now start up disconnected; it tries to reach your servers until it succeeds. On the other hand if you wrote code like this to check if mongod is up: >>> try: ... MongoClient() ... print("it's working") ... except pymongo.errors.ConnectionFailure: ... print("please start mongod") ... This will not work any more, since the constructor never throws ConnectionFailure now. Instead, choose how long to wait before giving up by setting “serverSelectionTimeoutMS”: >>> client = MongoClient(serverSelectionTimeoutMS=500) >>> try: ... client.admin.command('ping') ... print("it's working") ... except pymongo.errors.ConnectionFailure: ... print("please start mongod") One Monitor Thread Per Server Even during regular operations, connections may hang up or time out, and servers go down for periods; monitoring each on a separate thread keeps PyMongo abreast of changes before they cause errors. You will see fewer network exceptions than with PyMongo 2, and the new driver will recover much faster from the unexpected. Thread Safety Another source of fragility in PyMongo 2 was APIs that were not designed for multithreading. Too many of PyMongo’s options could be changed at runtime. For example, if you created a database handle: >>> db = client.test … and changed the handle’s read preference on a thread, the change appeared in all threads: >>> def thread_fn(): ... db.read_preference = ReadPreference.SECONDARY Making these options mutable encouraged such mistakes, so we made them immutable. Now you configure handles to databases and collections using thread-safe APIs: >>> def thread_fn(): ... my_db = client.get_database( ... 'test', ... read_preference=ReadPreference.SECONDARY) Modernity Last, and most satisfying to the team, we have completed our transition to modern Python. While PyMongo 2 already supported the latest version of Python 3, it did so tortuously by executing “auto2to3” on its source at install time. This made it too hard for the open source community to contribute to our code, and it led to some absurdly obscure bugs . We have updated to a single code base that is compatible with Python 2 and 3. We had to drop support for the ancient Pythons 2.4 and 2.5; we were encouraged by recent download statistics to believe that these zombie Python versions are finally at rest. If you’re interested in learning more about the architecture of MongoDB, download our guide: Download the Architecture Guide
4 Steps to Success: From Surviving with Legacy Systems to Thriving with MongoDB
Legacy data migrations imply a change in the status quo. More often than not, when an organization finally undertakes a thorough analysis of its technology landscape, it arrives at the same decision: to do nothing. It is an understandably daunting task to upgrade or replace 20+ year-old applications and their database counterparts. But there are good reasons, beyond the tri-annual hardware upgrade, to propel those legacy monoliths of the 1990s into the 21st century. Companies that prevailed—and even triumphed—in the volatile spring of 2020 were those that transitioned to a more flexible usage model and were therefore able to adjust their business models more rapidly and reliably. MongoDB’s client, Sanoma, was one of the winners. Sanoma was able to scale from 3,000 to 150,000 users within 24 hours, without any service interruption. Innovation and modernization go hand in hand. However, while modernization can sadly occur without innovation, the opposite is simply not possible. A bit of history The concept of bringing data together through online data layers (ODL) or operational data stores (ODS) isn't new or specific to MongoDB. Accessing legacy systems, bringing data together, and making it all more easily accessible was a common goal even 20 years ago, and led to the search for the golden source of truth (i.e. the definitive master source for any given entity). This search proved elusive early on due to the hurdles involved with bringing data from diverse, over-structured relational constructs to a sole target called Operational Data Store (ODS) or Online Data Layer (ODL). The industry’s first attempts began with Object-oriented databases, then with the dead end of XML data stores. (In my personal opinion, Xquery and Xpath were never meant for real developers). After both endeavors failed, then came the wave of Apache efforts I like to call “Hadoop Solves the Planet,” in which companies dumped all their structured data onto a big-data treasure trove. Unfortunately, this resulted in a data desert rather than the data lake everybody was hoping for, since organizations then had to scramble to build a concept for secondary indexing, data dictionaries, and more, on top of having to rebuild the sensible structures they lost. In the 2010s, the document model, in conjunction with JSON notation , emerged as the new de facto standard. MongoDB release 3.x introduced the combination of ACID (atomicity, consistency, isolation, durability) and compliance with a broad range of data types (in BSON, for those in the know). Soon, the MongoDB team started implementing additional features of relational heritage: secondary indexing, ACID transactions, aggregations and manipulations of data in site, materialized views, joins, unions... the list goes on. Where we are now MongoDB documents can be enriched through different means and channels without touching the content — the consistency of all data and data lineage is implicitly guaranteed. A typical example is the extraction of a delivery address through a supply chain application and a billing address through an enterprise resource planning system. In many cases, those two systems have different requirements. MongoDB documents simply keep both instantiations intact and can even hold multiples of each attached to one single client profile without the need to complete loads and transformations, foreign keys, and all the other ingredients of the relational past. MongoDB simply adds and leverages other sources without destroying their context. MongoDB delivers an ODS and ODL experience while streamlining the time-consuming journey of replacing legacy application code.The data platform of true modernization and innovation has arrived! How your company can get here The entire journey can be summarized in four simple steps: Analysis: Where do I start my data journey to drive the fastest value? Scaffolding: How do I get my data out of the existing platform and bridge it to the new platform? Coding: How do I enter the world of adjusting and adapting my applications landscape? Innovation: Which are the easiest targets for my company to start achieving true innovation? The following sections answer these four questions and provide you with a starting point for your journey toward a new and improved solution landscape. Step 1: Analysis of your existing solution landscape Data Provisioning Data provisioning—the act of bringing data from source system(s) to target system—is actually the easy part of this step. Opinions may vary as to the very best approach, but most existing models for streaming data in real time make the process elegant and allow for a business-driven decision from real-time replication on one end to communicate with the batch of .CSV files on the other end. Application onboarding More exciting is the application onboarding phase, inclusive of the selection and design of initial data domains. Here, simple mechanisms derived from the classic priority concepts can assist—and yes, they existed long before computers. Data domains already exist in objects in the business logic represented through their objects in the various programming languages. But even the most talented application developer deals with constant changes which leads to compromises in those objects and can obfuscate the original clarity in their design so the objects may hide in plain sight. Unearthing those gems and aligning them to the ODS is the most important step towards true legacy modernization. The most simple solution is actually the most practical one: load an object with the existing software and persist it into a MongoDB collection. The effort of persisting the object results in two lines of code that can be easily added. The location of the two lines of code (first line one opens connection to database; second line one persists the object) does not matter as long as it is in a place after the object is built out. This is the first time you will see the beauty of MongoDB and MQL at work. You really have to do nothing for the object itself—e.g. no decomposition or abstraction layer. MongoDB takes care of it for you. When looking at the object in the MongoDB database, e.g. using MongoDB Compass, you will realize that it already looks a lot like the domain object you wanted. The actual task to map objects to domains, or subset of domains, is now mostly driven by the application use case. Tip: How to leverage application mapping to accelerate onboarding In the model below, which was taken from the financial industry but can easily be adopted across industries, we identify the data domains in various applications and map their behavior to the effort it takes to locate them as well as their importance to the app. First, each domain gets a rating for its object complexity, where “complexity” is defined by the implementation team. This is similar to the concept of “ poker ” in a development sprint. Second, each data domain must be located in the application content. Then, it’s tally time. As we can see in the example above, the concept of schedules looks quite easy but is superseded by the client profiles which have a touch more application context (spoiler: those always come out on top). Based on the combination of complexity and the number of data domains affecting an application, we can now easily achieve the model below. Agile is your friend and, assuming a certain “point capacity,” the applications fall into place for their conversion schedule in a quite neutral fashion. The development team will then start with low hanging fruit. As soon as application 1, 6 and 7 are ported, we’re in business in a new modern landscape. Along the journey, the domains will get cleaned up naturally as we do not have the static corsage of the RDBMS table designs. Step 2: Scaffolding Scaffolding is the art of building a bridge that can hold people as they cross it, then immediately dissipate once they step off. But for that critical time, it needs to hold. The same is true for the connectivity between a legacy system and a new data platform. Starting with the first sprint, we have data residing in the MongoDB data platform. If the data is limited to new applications and resides exclusively in MongoDB, nothing needs to be done. However, as shown in the client profiles example above, there may be dependencies to consider. The synchronization between the legacy database and the new MongoDB platform can be easily arranged using microservices and the same concepts used for the initial loading of data. Synchronization can also be achieved through “the gate” if only READ data is needed during the first sprint, or if you’re already dealing with WRITE and the requirement to synchronize those writes back to a legacy system. Streaming: A streaming based solution is a great option for uni-directional operations that allow read only in the most simple way. Service: Selecting a simple, tiny microservice is a good option for the use case where data needs to be selectively written. It works using the document model on the MongoDB side, but can still push necessary updates back to the legacy system, and vice-versa. The great news is that this service potentially exists already, as it requires nothing more than using the old database interface from the legacy application on one side and the new, easy-to-digest JSON document format on the MongoDB side. If both databases are ACID-compliant, any transaction is automatically treated as a normal application interaction on both sides. “Y-Loader”: Another option is a true “Y-loader,” where all transactions are written in sync to both databases in parallel, and the actual transaction is only considered committed when both systems report their commit and completion. Simple two-phase protocols (write to both, wait five seconds, read both to validate and, if in sync, commit to application) are available as ready-made services through various distributed transaction coordinators, but often it’s easier to use the existing data access in the application. In that case, the new data path to MongoDB is in parallel, and a simple redundant checkpoint (which the application logic would have had for the legacy path anyway) is expanded for this purpose. Step 3: Coding The coding with the new domain data model, as well as the MongoDB flexible document model as the underlying base, will immediately impact the coding for the business logic and application development. The operative word is immediately. As the data gets unlocked with the initial persistence of the code object to the MongoDB collection, the developer is simultaneously able to code based on business requirements. Developers will no longer be hindered by reference and requirements of object mappers. As the objects are represented through the MongoDB idiomatic drivers, each programming object resides directly in the data collection; in reverse, any changes to the business logic object will be naturally represented—code-free—in the MongoDB collection. A single blog post can't resolve all open questions and edge cases. Each application, client, and data interface is unique. Databases possess historic technical debt and implicit assumptions that become lost in generations of developers over time. “Do not touch this section—not sure what it does but last time we tried all hell broke loose…” is often-heard advice around the organizational water cooler. But the key lesson? There are many different templates available and very simple methods of quickly taking the lead to significant success. For example, a German client, who was stuck in a combination of IBM DB2 (mainframe and distributed) with a significant Hadoop footprint, was amazed when they realized they could “lift” their data one microservice at a time. This resulted in business requirements shifting from “impossible to do” for some requested queries to “completed in under one second” within a single week of a proof-of-concept. This is no exception. Cases and changes like these are made daily, reinforcing Mark Twain’s sage advice that “The secret of getting ahead is getting started." Step 4: Innovation As the migration from the legacy environment continues, innovation will be the new focus. The unlocking of previously siloed data allows immediate coupling of real-time data with machine learning platforms for various purposes: e.g. scoring for financial decision-making, personalization for retail, or optimization of production processes in the IOT context. New applications and solutions can easily be created on top of the unleashed data, even with various programming languages, direct real-time dashboards created with MongoDB Charts, and different paradigms (again, MongoDB’s idiomatic drivers do magic!) At this time, the discussion with the product owners in your squads and tribes (trying to be real modern here) begins with the question“What is the highest priority component to change?” and “What function is required to enable this change?” Is it worth waiting much longer? The real question is: why did we all not start sooner? It’s time to begin integrating the list of features you always dreamed of having, but never dared to pursue. The MongoDB team is here to help you get started. Reach out today and let’s discuss the best path forward. To learn more about modernizing to MongoDB, click here .