MongoDB 3.4.0-rc2 is out and is ready for testing. This is the culmination of the 3.3.x development series.
Fixed in this release candidate:
- SERVER-7306 Mongod as windows service should not claim to be 'started' until it is ready to accept connections
- SERVER-18908 Secondaries unable to keep up with primary under WiredTiger
- SERVER-26420 Make internal clients identify themselves in the isMaster handshake
- SERVER-26514 Create command should take idIndex option
- SERVER-26648 Tolerate bad collection metadata produced on version 2.4 or earlier
- SERVER-26652 Invalid definitions in systemd configuration for debian
- WT-1592 Dump detailed cache information via statistics
- WT-2954 Inserting multi-megabyte values can cause large in-memory pages
As always, please let us know of any issues.
-- The MongoDB Team
Microservices Webinar Recap
Recently, we held a webinar discussing microservices, and how two companies, Hudl and UPS i-parcel, leverage MongoDB as the database powering their microservices environment. There have been a number of theoretical and vendor-led discussions about microservices over the past couple of years. We thought it would be of value to share with you real world insights from companies who have actually adopted microservices, as well as answers to questions we received from the audience during the live webinar. Jon Dukulil is the VP of Engineering from Hudl and Yursil Kidwai is the VP of Technology from UPS i-parcel. How are Microservices different from Service Oriented Architectures (SOAs) utilizing SOAP/REST with an Enterprise Service Bus (ESB)? Microservices and SOAs are related in that both approaches distribute applications into individual services. Where they differ though, is the scope of the problem they address today. SOAs aim for flexibility at the enterprise IT level. This can be a complex undertaking as SOAs only work when the underlying services do not need to be modified. Microservices represent an architecture for an individual service, and aim at facilitating continous delivery and parallel development of multiple services. The following graphic highlights some of the differences. One significant difference between SOAs and microservices revolves around the messaging system, which coordinates and synchronizes communication between different services in the application. Enterprise service buses (ESB) emerged as a solution for SOAs because of the need for service integration and a central point of coordination. As ESBs grew in popularity, enterprise vendors packaged more and more software and smarts into the middleware, making it difficult to decouple the different services that relied on the ESB for coordination. Microservices keep the messaging middleware focused on sharing data and events, and enabling more of the intelligence at the endpoints. This makes it easier to decouple and separate individual services. How big should a microservice be? There are many differing opinions about how large a microservice should be, thus it really depends on your application needs. Here is how Hudl and UPS i-parcel approach that question. Jon Dukulil (Hudl) : We determine how big our microservice should be the amount of work that can be completed by a squad. For us, a squad is a small completely autonomous team. It consists of 4 separate functions: product manager, developer, UI designer, and QA. When we are growing headcount we are not thinking of growing larger teams, we are thinking of adding more squads. !(https://webassets.mongodb.com/_com_assets/cms/Microservices_MongoDB_Blog2-a6l74owk23.png) Yursil Kidwai (UPS i-parcel) : For us, we have defined microservice as a single verb (e.g. Billing), and are constantly challenging ourselves on how that verb should be defined. We follow the “two pizza” rule, in which a team should never be larger than what you can feed with two large pizzas. Whatever our “two pizza” team can deliver in one week is what we consider to be the right size for a microservice. Why should I decouple databases in a microservices environment? Can you elaborate on this? One of the core principles behind microservices is strong cohesion (i.e. related code grouped together) and loose coupling (i.e. a change to one service should not require a change to another). With a shared database architecture both these principles are lost. Consumers are tied to a specific technology choice, as well as particular database implementation. Application logic may also be spread among multiple consumers. If a shared piece of information needs to be edited, you might need to change the behavior in multiple places, as well as deploy all those changes. Additionally, in a shared database architecture a catastrophic failure with the infrastructure has the potential to affect multiple microservices and result in a substantial outage. Thus, it is recommended to decouple any shared databases so that each microservice has its own database. Due to the distributed nature of microservices, there are more failure points. Because of all these movable parts in microservices, how do you deal with failures to ensure you meet your SLAs? Jon Dukulil (Hudl) : For us it’s an important point. By keeping services truly separate where they share as little as possible, that definitely helps. You’ll hear people working with microservices talk about “minimizing the blast radius” and that’s what I mean by the separation of services. When one service does have a failure it doesn’t take everything else down with it. Another thing is that when you are building out your microservices architecture, take care of the abstractions that you create. Things in a monolith that used to be a function call are now a network call, so there are many more things that can fail because of that: networks can timeout, network partitions, etc. Our developers are trained to think about what happens if we can’t complete the call. For us, it was also important to find a good circuit breaker framework and we actually wrote our own .NET version of a framework that Netflix built called Hystrix. That has been pretty helpful to isolate points of access between services and stop failures from cascading. Yursil Kidwai (UPS i-parcel) : One of the main approaches we took to deal with failures and dependencies was the choice to go with MongoDB. The advantage for us is MongoDB’s ability to deploy a single replica set across multiple regions. We make sure our deployment strategy always includes multiple regions to create that high availability infrastructure. Our goal is to always be up, and the ability of MongoDB’s replica sets to very quickly recover from failures is key to that. Another approach was around monitoring. We built our own monitoring framework that we are reporting on with Datadog. We have multiple 80 inch TVs displaying dashboards of the health of all our microservices. The dashboards are monitoring the throughput of the microservices on a continual basis, with alerts to our ops team configured if the throughput for a service falls below an acceptable threshold level. Finally, it’s important for the team to be accountable. Developers can’t just write code and not worry about, but they own the code from beginning to end. Thus, it is important for developers to understand the interdependencies between DevOps, testing, and release in order to properly design a service. Why did you choose MongoDB and how does it fit in with your architecture? Jon Dukulil (Hudl) : One, from a scaling perspective, we have been really happy with MongoDB’s scalability. We have many small databases and a couple of very large databases. Our smallest database today is serving up just 9MB of data. This is pretty trivial so we need these small databases to run on cost effective hardware. Our largest database is orders of magnitude larger and is spread over 8 shards. The hardware needs of those different databases are very different, but they are both running on MongoDB. Fast failovers are another big benefit for us. It’s fully automated and it’s really fast. Failovers are in the order of 1-5 seconds for us, and the more important thing is they are really reliable. We’ve never had an issue where a failover hasn’t gone well. Lastly, since MongoDB has a dynamic schema, for us that means that the code is the schema. If I’m working on a new feature and I have a property that last week was a string, but this week I want it to be an array of strings, I update my code and I’m ready to go. There isn’t much more to it than that. Yursil Kidwai (UPS i-parcel) : In many parts of the world, e-commerce rules governing cross border transaction are still changing and thus our business processes in those areas are constantly being refined. To handle the dynamic environment that our business operates in, the requirement to change the schema was paramount to us. For example, one country may require a tax identification number, while another country may suddenly decide it needs your passport, as well as some other classification number. As these changes are occurring, we really need something behind us that will adapt with us and MongoDB’s dynamic schema gave us the ability to quickly experiment and respond to our ever changing environment. We also needed the ability to scale. We have 20M tracking events across 100 vendors processed daily, as well as tens of thousands of new parcels that enter into our system every day. MongoDB’s ability to scale-out on commodity hardware and its elastic scaling features really allowed us to handle any unexpected inflows. Next Steps To understand more about the business level drivers and architectural requirements of microservices, read Microservices: Evolution of Building Modern Apps Whitepaper . For a technical deep dive into microservices and containers, read Microservices: Containers and Orchestration Whitepaper
How to Get Started with MongoDB Atlas and Confluent Cloud
Every year more and more applications are leveraging the public cloud and reaping the benefits of elastic scale and rapid provisioning. Forward-thinking companies such as MongoDB and Confluent have embraced this trend, building cloud-based solutions such as MongoDB Atlas and Confluent Cloud that work across all three major cloud providers. Companies across many industries have been leveraging Confluent and MongoDB to drive their businesses forward for years. From insurance providers gaining a customer-360 view for a personalized experience to global retail chains optimizing logistics with a real-time supply chain application, the connected technologies have made it easier to build applications with event-driven data requirements. The latest iteration of this technology partnership simplifies getting started with a cloud-first approach, ultimately improving developer’s productivity when building modern cloud-based applications with data in motion. Today, the MongoDB Atlas source and sink connectors are generally available within Confluent Cloud. With Confluent’s cloud-native service for Apache Kafka® and these fully managed connectors, setup of your MongoDB Atlas integration is simple. There is no need to install Kafka Connect or the MongoDB Connector for Apache Kafka, or to worry about scaling your deployment. All the infrastructure provisioning and management is taken care of for you, enabling you to focus on what brings you the most value — developing and releasing your applications rapidly. Let’s walk through a simple example of taking data from a MongoDB cluster in Virginia and writing it into a MongoDB cluster in Ireland. We will use a python application to write fictitious data into our source cluster. Step 1: Set up Confluent Cloud First, if you’ve not done so already, sign up for a free trial of Confluent Cloud . You can then use the Quick Start for Apache Kafka using Confluent Cloud tutorial to create a new Kafka cluster. Once the cluster is created, you need to enable egress IPs and copy the list of IP addresses. This list of IPs will be used as an IP Allow list in MongoDB Atlas. To locate this list, select “Custer Settings” and then the “Networking” tab. Keep this tab open for future reference: you will need to copy these IP addresses into the Atlas cluster in Step 2. Step 2: Set Up the Source MongoDB Atlas Cluster For a detailed guide on creating your own MongoDB Atlas cluster, see the Getting Started with Atlas tutorial. For the purposes of this article, we have created an M10 MongoDB Atlas cluster using the AWS cloud in the us-east-1 (Virginia) data center to be used as the source, and an M10 MongoDB Atlas cluster using the AWS cloud in the eu-west-1 (Ireland) data center to be used as the sink. Once your clusters are created, you will need to configure two settings in order to make a connection: database access and network access. Network Access You have two options for allowing secure network access from Confluent Cloud to MongoDB Atlas: You can use AWS PrivateLink, or you can secure the connection by allowing only specific IP connections from Confluent Cloud to your Atlas cluster. In this article, we cover securing via IPs. For information on setting up using PrivateLink, read the article Using the Fully Managed MongoDB Atlas Connector in a Secure Environment . To accept external connections in MongoDB Atlas via specific IP addresses, launch the “IP Access List” entry dialog under the Network Access menu. Here you add all the IP addresses that were listed in Confluent Cloud from Step 1. Once all the egress IPs from Confluent Cloud are added, you can configure the user account that will be used to connect from Confluent Cloud to MongoDB Atlas. Configure user authentication in the Database Access menu. Database Access You can authenticate to MongoDB Atlas using username/password, certificates, or AWS identity and access management (IAM) authentication methods. To create a username and password that will be used for connection from Confluent Cloud, select the “+ Add new Database User” option from the Database Access menu. Provide a username and password and make a note of this credential, because you will need it in Step 3 and Step 4 when you configure the MongoDB Atlas source and sink connectors in Confluent Cloud. Note: In this article we are creating one credential and using it for both the MongoDB Atlas source and MongoDB sink connectors. This is because both of the clusters used in this article are from the same Atlas project. Now that the Atlas cluster is created, the Confluent Cloud egress IPs are added to the MongoDB Atlas Allow list, and the database access credentials are defined, you are ready to configure the MongoDB Atlas source and MongoDB Atlas sink connectors in Confluent Cloud. Step 3: Configure the Atlas Source Now that you have two clusters up and running, you can configure the MongoDB Atlas connectors in Confluent Cloud. To do this, select “Connectors” from the menu, and type “MongoDB Atlas” in the Filters textbox. Note: When configuring MongoDB Atlas source And MongoDB Atlas sink, you will need the connection host name of your Atlas clusters. You can obtain this host name from the MongoDB connection string. An easy way to do this is by clicking on the "Connect" button for your cluster. This will launch the Connect dialog. You can choose any of the Connect options. For purposes of illustration, if you click on “Connect using MongoDB Compass.” you will see the following: The highlighted part in the above figure is the connection hostname you will use when configuring the source and sink connectors in Confluent Cloud. Configuring the MongoDB Atlas Source Connector Selecting “MongoDbAtlasSource” from the list of Confluent Cloud connectors presents you with several configuration options. The “Kafka Cluster credentials” choice is an API-based authentication that the connector will use for authentication with the Kafka broker. You can generate a new API key and secret by using the hyperlink. Recall that the connection host is obtained from the MongoDB connection string. Details on how to find this are described at the beginning of this section. The “Copy existing data” choice tells the connector upon initial startup to copy all the existing data in the source collection into the desired topic. Any changes to the data that occur during the copy process are applied once the copy is completed. By default, messages from the MongoDB source are sent to the Kafka topic as strings. The connector supports outputting messages in formats such as JSON and AVRO. Recall that the MongoDB source connector reads change stream data as events. Change stream event metadata is wrapped in the message sent to the Kafka topic. If you want just the message contents, you can set the “Publish full document only” output message to true. Note: For source connectors, the number of tasks will always be “1”: otherwise you will run the risk of duplicate data being written to the topic, because multiple workers would effectively be reading from the same change stream event stream. To scale the source, you could create multiple source connectors and define a pipeline that looks at only a portion of the collection. Currently this capability for defining a pipeline is not yet available in Confluent Cloud. Step 4: Generate Test Data At this point, you could run your python data generator application and start inserting data into the Stocks.StockData collection at your source. This will cause the connector to automatically create the topic “demo.Stocks.StockData.” To use the generator, git-clone the stockgenmongo folder in the above-referenced repository and launch the data generation as follows: python stockgen.py -c "< >" Where the MongoDB connection URL is the full connection string obtained from the Atlas source cluster. An example connection string is as follows: mongodb+srv://kafkauser:firstname.lastname@example.org Note: You might need to pip-install pymongo and dnspython first. If you do not wish to use this data generator, you will need to create the Kafka topic first before configuring the MongoDB Atlas sink. You can do this by using the Add a Topic dialog in the Topics tab of the Confluent Cloud administration portal. Step 5: Configuring the MongoDB Atlas Sink Selecting “MongoDB Atlas Sink” from the list of Confluent Cloud connectors will present you with several configuration options. After you pick the topic to source data from Kafka, you will be presented with additional configuration options. Because you chose to write your data in the source by using JSON, you need to select “JSON” in the input message format. The Kafka API key is an API key and secret used for connector authentication with Confluent Cloud. Recall that you obtain the connection host from the MongoDB connection string. Details on how to find this are described previously at the beginning of Step 3. The “Connection details” section allows you to define behavior such as creating a new document for every topic message or updating an existing document based upon a value in the message. These behaviors are known as document ID and write model strategies. For more information, check out the MongoDB Connector for Apache Kafka sink documentation . If order of the data in the sink collection is not important, you could spin up multiple tasks to gain an increase in write performance. Step 6: Verify Your Data Arrived at the Sink You can verify the data has arrived at the sink via the Atlas web interface. Navigate to the collection data via the Collections button. Now that your data is in Atlas, you can leverage many of the Atlas platform capabilities such as Atlas Search, Atlas Online Archive for easy data movement to low-cost storage, and MongoDB Charts for point-and-click data visualization. Here is a chart created in about one minute using the data generated from the sink cluster. Summary Apache Kafka and MongoDB help power many strategic business use cases, such as modernizing legacy monolithic systems, single views, batch processing, and event-driven architectures, to name a few. Today, Confluent and MongoDB Cloud and MongoDB Atlas provide fully managed solutions that enable you to focus on the business problem you are trying to solve versus spinning your tires in infrastructure configuration and maintenance. Register for our joint webinar to learn more!