MongoDB Developer

Coding with MongoDB - news for developers, tips and deep dives

MongoDB Query API Webinar: FAQ

Last week we held a live webinar on the MongoDB Query API and our lineup of idiomatic programming language drivers. There were many great questions during the session, and in this post, what I want to do is share the most frequently asked ones with you. But first - here is a quick summary of what MongoDB Query API is all about if you are unfamiliar with it. What is MongoDB Query API? MongoDB is built upon the document data model . The document model is designed to be intuitive, flexible, universal, and powerful. You can easily work with a variety of data, and because documents map directly to the objects in your code, it fits naturally in your app development experience. MongoDB Query API lets you work with data as code and build any class of application faster by giving you extensive query capabilities natively in any modern programming language. Whether you’re working with transactional data, looking for search capabilities, or trying to run sophisticated real-time analytics, MongoDB Query API can meet your needs. MongoDB Query API has some unique features like its expressive query, primary and secondary indexes, powerful aggregations and transformations, on-demand materialized views, and more — enabling you to work with data of any structure, at any scale. Some key features to highlight: Indexes To optimize any workload and query pattern you can take advantage of a large set of index types like multi-key (for arrays), wildcard, geospatial, and more and index any field no matter how deeply nested it is within your documents. Fully featured secondary indexes are document-optimized and include partial, unique, case insensitive, and sparse. Aggregation Pipeline Aggregation pipeline lets you group, transform, and analyze your data to support any class of workload. You can choose from dozens of aggregation stages and over 200 operators to build modular and expressive pipelines. You can also use low-code tools like MongoDB Compass to drag and drop stages, examine intermediate output, and export to your programming language of choice. On-Demand Materialized Views The powerful $merge aggregation stage allows you to combine the results of your aggregation pipeline with existing collections to update and enrich data without having to recompute your entire data set. You can output results to sharded and unsharded collections while simultaneously defining indexes on each view Geospatial and Graph Utilize MongoDB’s built-in natively ability to store and run queries against geospatial data Use operators like $graphLookup to quickly traverse connected data sets These are just a few of the features we highlighted in the MongoDB Query API webinar. No matter what type of application you are thinking of building or managing, MongoDB Query API can meet your needs as the needs of your users and application change. FAQs for MongoDB Query API Here are the most common questions asked during the webinar: Do we have access to the data sets presented in this webinar? Yes, you can easily create a cluster and load the sample data sets into Atlas. Instructions on how to get started are here . How can I access full-text search capabilities? Text search is a standard feature of MongoDB Atlas. You can go to cloud.mongodb.com to try it out using sample data sets. Does VS code plugin support Aggregation? Yes, it does. You can learn more about the VS code plugin on our docs page. If you need to pass variable values in the aggregation, say the price range from the app as an input, how would you do that? This is no different than sending a query - since you construct your aggregation in your application you just fill in the field you want with value/variable in your code. Is there any best practice document on MongoDB query API to have stable performance and utilize minimum resources? Yes, we have tips and tricks on optimizing performance by utilizing indexes, filters, and tools here . Does MongoDB support the use of multiple different indexes to meet the needs of a single query? Yes, this can be accomplished by the use of compound indexes. You can learn more about it in our docs here . If you work with big data and create a collection, is it smarter to create indexes first or after the collection is filled (regarding the time to create a collection)? It is better to create the indexes first as they will take less time to create if the collection is empty, but you still have an option to create the index once the data is there in the collection. There are multiple great benefits of MongoDB’s indexing capabilities: When building indexes, there is no impact on your app’s availability since the index operation is online. Flexibility to add and remove indexes at any time. Ability to hide indexes to evaluate the impact of removing them before officially dropping them. Where do I go to learn more? Here are some resources to help you get started: MongoDB Query API page MongoDB University MongoDB Docs You can also check out the webinar replay here .

May 7, 2021
Developer

How to Get Started with MongoDB Atlas and Confluent Cloud

Every year more and more applications are leveraging the public cloud and reaping the benefits of elastic scale and rapid provisioning. Forward-thinking companies such as MongoDB and Confluent have embraced this trend, building cloud-based solutions such as MongoDB Atlas and Confluent Cloud that work across all three major cloud providers. Companies across many industries have been leveraging Confluent and MongoDB to drive their businesses forward for years. From insurance providers gaining a customer-360 view for a personalized experience to global retail chains optimizing logistics with a real-time supply chain application, the connected technologies have made it easier to build applications with event-driven data requirements. The latest iteration of this technology partnership simplifies getting started with a cloud-first approach, ultimately improving developer’s productivity when building modern cloud-based applications with data in motion. Today, the MongoDB Atlas source and sink connectors are generally available within Confluent Cloud. With Confluent’s cloud-native service for Apache Kafka® and these fully managed connectors, setup of your MongoDB Atlas integration is simple. There is no need to install Kafka Connect or the MongoDB Connector for Apache Kafka, or to worry about scaling your deployment. All the infrastructure provisioning and management is taken care of for you, enabling you to focus on what brings you the most value — developing and releasing your applications rapidly. Let’s walk through a simple example of taking data from a MongoDB cluster in Virginia and writing it into a MongoDB cluster in Ireland. We will use a python application to write fictitious data into our source cluster. Step 1: Set up Confluent Cloud First, if you’ve not done so already, sign up for a free trial of Confluent Cloud . You can then use the Quick Start for Apache Kafka using Confluent Cloud tutorial to create a new Kafka cluster. Once the cluster is created, you need to enable egress IPs and copy the list of IP addresses. This list of IPs will be used as an IP Allow list in MongoDB Atlas. To locate this list, select “Custer Settings” and then the “Networking” tab. Keep this tab open for future reference: you will need to copy these IP addresses into the Atlas cluster in Step 2. Step 2: Set Up the Source MongoDB Atlas Cluster For a detailed guide on creating your own MongoDB Atlas cluster, see the Getting Started with Atlas tutorial. For the purposes of this article, we have created an M10 MongoDB Atlas cluster using the AWS cloud in the us-east-1 (Virginia) data center to be used as the source, and an M10 MongoDB Atlas cluster using the AWS cloud in the eu-west-1 (Ireland) data center to be used as the sink. Once your clusters are created, you will need to configure two settings in order to make a connection: database access and network access. Network Access You have two options for allowing secure network access from Confluent Cloud to MongoDB Atlas: You can use AWS PrivateLink, or you can secure the connection by allowing only specific IP connections from Confluent Cloud to your Atlas cluster. In this article, we cover securing via IPs. For information on setting up using PrivateLink, read the article Using the Fully Managed MongoDB Atlas Connector in a Secure Environment . To accept external connections in MongoDB Atlas via specific IP addresses, launch the “IP Access List” entry dialog under the Network Access menu. Here you add all the IP addresses that were listed in Confluent Cloud from Step 1. Once all the egress IPs from Confluent Cloud are added, you can configure the user account that will be used to connect from Confluent Cloud to MongoDB Atlas. Configure user authentication in the Database Access menu. Database Access You can authenticate to MongoDB Atlas using username/password, certificates, or AWS identity and access management (IAM) authentication methods. To create a username and password that will be used for connection from Confluent Cloud, select the “+ Add new Database User” option from the Database Access menu. Provide a username and password and make a note of this credential, because you will need it in Step 3 and Step 4 when you configure the MongoDB Atlas source and sink connectors in Confluent Cloud. Note: In this article we are creating one credential and using it for both the MongoDB Atlas source and MongoDB sink connectors. This is because both of the clusters used in this article are from the same Atlas project. Now that the Atlas cluster is created, the Confluent Cloud egress IPs are added to the MongoDB Atlas Allow list, and the database access credentials are defined, you are ready to configure the MongoDB Atlas source and MongoDB Atlas sink connectors in Confluent Cloud. Step 3: Configure the Atlas Source Now that you have two clusters up and running, you can configure the MongoDB Atlas connectors in Confluent Cloud. To do this, select “Connectors” from the menu, and type “MongoDB Atlas” in the Filters textbox. Note: When configuring MongoDB Atlas source And MongoDB Atlas sink, you will need the connection host name of your Atlas clusters. You can obtain this host name from the MongoDB connection string. An easy way to do this is by clicking on the "Connect" button for your cluster. This will launch the Connect dialog. You can choose any of the Connect options. For purposes of illustration, if you click on “Connect using MongoDB Compass.” you will see the following: The highlighted part in the above figure is the connection hostname you will use when configuring the source and sink connectors in Confluent Cloud. Configuring the MongoDB Atlas Source Connector Selecting “MongoDbAtlasSource” from the list of Confluent Cloud connectors presents you with several configuration options. The “Kafka Cluster credentials” choice is an API-based authentication that the connector will use for authentication with the Kafka broker. You can generate a new API key and secret by using the hyperlink. Recall that the connection host is obtained from the MongoDB connection string. Details on how to find this are described at the beginning of this section. The “Copy existing data” choice tells the connector upon initial startup to copy all the existing data in the source collection into the desired topic. Any changes to the data that occur during the copy process are applied once the copy is completed. By default, messages from the MongoDB source are sent to the Kafka topic as strings. The connector supports outputting messages in formats such as JSON and AVRO. Recall that the MongoDB source connector reads change stream data as events. Change stream event metadata is wrapped in the message sent to the Kafka topic. If you want just the message contents, you can set the “Publish full document only” output message to true. Note: For source connectors, the number of tasks will always be “1”: otherwise you will run the risk of duplicate data being written to the topic, because multiple workers would effectively be reading from the same change stream event stream. To scale the source, you could create multiple source connectors and define a pipeline that looks at only a portion of the collection. Currently this capability for defining a pipeline is not yet available in Confluent Cloud. Step 4: Generate Test Data At this point, you could run your python data generator application and start inserting data into the Stocks.StockData collection at your source. This will cause the connector to automatically create the topic “demo.Stocks.StockData.” To use the generator, git-clone the stockgenmongo folder in the above-referenced repository and launch the data generation as follows: python stockgen.py -c "< >" Where the MongoDB connection URL is the full connection string obtained from the Atlas source cluster. An example connection string is as follows: mongodb+srv://kafkauser:kafkapassword123@democluster.lkyil.mongodb.net Note: You might need to pip-install pymongo and dnspython first. If you do not wish to use this data generator, you will need to create the Kafka topic first before configuring the MongoDB Atlas sink. You can do this by using the Add a Topic dialog in the Topics tab of the Confluent Cloud administration portal. Step 5: Configuring the MongoDB Atlas Sink Selecting “MongoDB Atlas Sink” from the list of Confluent Cloud connectors will present you with several configuration options. After you pick the topic to source data from Kafka, you will be presented with additional configuration options. Because you chose to write your data in the source by using JSON, you need to select “JSON” in the input message format. The Kafka API key is an API key and secret used for connector authentication with Confluent Cloud. Recall that you obtain the connection host from the MongoDB connection string. Details on how to find this are described previously at the beginning of Step 3. The “Connection details” section allows you to define behavior such as creating a new document for every topic message or updating an existing document based upon a value in the message. These behaviors are known as document ID and write model strategies. For more information, check out the MongoDB Connector for Apache Kafka sink documentation . If order of the data in the sink collection is not important, you could spin up multiple tasks to gain an increase in write performance. Step 6: Verify Your Data Arrived at the Sink You can verify the data has arrived at the sink via the Atlas web interface. Navigate to the collection data via the Collections button. Now that your data is in Atlas, you can leverage many of the Atlas platform capabilities such as Atlas Search, Atlas Online Archive for easy data movement to low-cost storage, and MongoDB Charts for point-and-click data visualization. Here is a chart created in about one minute using the data generated from the sink cluster. Summary Apache Kafka and MongoDB help power many strategic business use cases, such as modernizing legacy monolithic systems, single views, batch processing, and event-driven architectures, to name a few. Today, Confluent and MongoDB Cloud and MongoDB Atlas provide fully managed solutions that enable you to focus on the business problem you are trying to solve versus spinning your tires in infrastructure configuration and maintenance. Register for our joint webinar to learn more!

May 6, 2021
Developer

Use x509 certificate-based authentication with MongoDB and Apache Kafka

Kafka has emerged as a popular event streaming platform. The inherent "pub/sub" model can be viewed as a method for moving data between systems. As such, MongoDB offers a Kafka connector , enabling Kafka topics to be copied into a MongoDB cluster (the sink). Similarly, the connector enables data movement from a MongoDB cluster (the source) into Kafka topics. To access data securely, certificate-based X.509 authentication is a natural choice for server-to-server authentication scenarios with Kafka and MongoDB. Certificates avoid having to store or manage usernames and passwords when used with database connection strings. For example, such user credentials could be inadvertently exposed if "hard-coded" in configuration files or other uses. An X.509 certificate is a structured, binary record. This record consists of several key and value pairs. X.509 certificates use the widely accepted international X.509 public key infrastructure (PKI) standard. The use of certificates prevents user credential exposure. Authentication requests with certificates verifies that any public key presented by a client or another member of the cluster belongs to that client or member. The X.509 certificate method for authentication is more secure than conventional password-based certification because each server machine needs their own dedicated key to participate in the cluster. For use with secure TLS/SSL connections, MongoDB supports X.509 certificate authentication allowing clients to use public key infrastructure in lieu of SCRAM (username and password). The certificate encodes two very important pieces of information: the server's public key and a digital signature that can be used to confirm the certificate's authenticity. Additionally, the certificate will include metadata used by the Certificate Authority to track the certificate and provide guidelines on how the public key can be used. Using the server's public key, the client and server are able to negotiate a shared symmetric key securely, which can be used to secure communications. Users can either generate their own certificates and keys (self-managed) or use the Atlas PKI. In either case, first a project-specific CA private and public key is generated, and then a per-user private key and signed X.509 identity certificate is created. If using self-managed X.509 infrastructure , you'll need to upload your CA public key certificate into your Atlas project. If using Atlas-managed X.509 infrastructure, you'll need to download the project private key and provide that to your Kafka Connect service. This signed certificate is then pushed to each server member in your Atlas cluster. The below diagram shows the deployment of a standard 3 node replica set and client using x.509 authentication: In non-production environments, the basic SCRAM authentication method may be most suitable. However, for production environments or server-server scenarios such as a Kafka-MongoDB integration, X.509 authentication is the recommended mechanism. To use X.509 certification for server-server authentication, first confirm that you are able to authenticate to an Atlas cluster using X.509 certificates. Then follow the steps below. Prerequisites: Openssl must be installed Project-level CA & user certificates created in PEM format If using Atlas-managed certificates, user-specific client certificate (see X.509 tab: https://docs.atlas.mongodb.com/security-add-mongodb-users/#database-user-authentication ) If using self-managed X.509 auth, you will need to create & upload your CA public key to Atlas (see https://docs.atlas.mongodb.com/security-self-managed-x509/ ), and have a user-specific client certificate ready Ensure that you have installed the MongoDB Kafka Connector and understand how to use it with Kafka Connect. Then follow these steps: Obtain the client user certificate from your system administrator (or from Atlas). In this example, the user certificate is stored in PEM file kafkaclient-X509-cert.pem and will be associated with the Atlas database user kafka-svc . Convert the PEM file to a password-protected PKCS12 formatted certificate by running this command: openssl pkcs12 -export -in kafkaclient-X509-cert.pem -out kafkaclient-X509-cert.p12 -password pass:mypassword Copy PKCS12 certificate ( kafkaclient-x509-cert.p12 ) to the server where Kafka Connect is running. Note the full path of the PKCS12 certificate location. Update the Kafka Connect configuration in the KAFKA_OPTS environment variable: export KAFKA_OPTS="-Djavax.net.ssl.keyStore=<path to kafkaclient-x509-cert.p12> -Djavax.net.ssl.keyStorePassword=mypassword -Djavax.net.ssl.keyStoreType=PKCS12" Restart Kafka Connect Update the MongoDB Connector configuration to use a connection URI with the following parameter options: Connection.uri: "mongodb+srv://<mongodb-host>/test?authSource=%24external&authMechanism=MONGODB-X509&subjectName=kafka-svc" Re-deploy the MongoDB connector using the Kafka Connect REST API, with the above configuration for the connection URI. Download the latest MongoDB Connector for Apache Kafka 1.5 from the Confluent Hub ! Read the MongoDB Connector for Apache Kafka documentation . Questions/Need help with the connector? Ask the Community .

May 5, 2021
Developer

Reducing Queue Times by Using Speculative Execution

When solving concurrency problems in software, the simplest solution is often to make the trickiest part of the problem serial. Here at MongoDB, this is exactly the approach we took to implement a commit queue, where engineers submit code changes to be tested and then merged into a repository. This worked well for many smaller repositories, but for large ones such as the MongoDB Server , testing submissions one at a time proved to be too slow, with engineers sometimes waiting hours for their code to finally make it into the repository. To solve this challenge, we introduced some speculative execution on top of our original approach, which reduced the wait time for a typical week by 62%. Background Many of the engineers at MongoDB submit their code changes to a commit queue, which runs a basic set of tests on these changes before merging them to the correct repository. The main difference between the commit queue running the tests and an engineer running the tests is that the commit queue tests with the latest changes to the code base, whereas an engineer has checked out the code base at some point in the past. To ensure that it has the latest changes, the commit queue tests only one set of changes at a time before either merging the changes if the tests pass, or rejecting them and notifying the author if the tests fail. This serial approach makes the system easy to understand, but it also presents an optimization opportunity to reduce the time spent waiting for tests to start. Design Approach Parallelization The only part of this system we needed to keep serial was the part that merged changes into the repository, because this ensures that changes would be merged in the same order in which they were submitted. By far, the slowest part of the commit queue is actually running the tests, and this is the work that we wanted to split among multiple machines. Let’s assume as an example that all submissions to the commit queue take 10 minutes to run. Let’s also assume that in one day there are 30 submissions to the commit queue at roughly the same time. With the previous requirement that the queue runs serially, this means it would take 300 minutes to get through all the submissions. If we parallelize testing the submissions among 30 machines, it would take only 10 minutes of actual time from when the last change was submitted to get through all the submissions. Speculative Execution With a serial queue, each successful submission checks out the latest code in the repository, applies its changes, runs tests, then commits the code back to the repository before the next submission starts. If we do these steps in parallel however, checking out the latest code in the repository will not include the changes from submissions that would have merged before the one being tested. Parallelizing our tests requires some extra steps to ensure that submissions run tests with the code changes from prior submissions. In order to know what code changes should be applied to which tests, the commit queue must still maintain the concept of an order for each submission. That way, the third entry in the queue will know that it must apply the changes from the first and second entries, in addition to its own code changes. If any test for a submission fails, it’s rejected from the queue and any submissions after it are rerun without the code changes from the one that failed. If all tests for a submission finish running, the submission will wait to be merged until the one immediately in front of it is merged. Performance Considerations Testing with merged code changes like this requires that most of the tests pass; otherwise the system will do a lot more work than it would have done if it tested submissions one at a time, and we lose all the benefits from parallelism. In the worst-case scenario where nothing passes, the nth submission in the queue would need to be restarted each time something in front of it fails, leading to total times that any submission is run. This means that if engineers add 10 submissions to the commit queue, the new parallel approach runs tests as many as 55 times, whereas with the old serial approach the tests would always run 10 times. Maybe this worst-case scenario isn’t a big deal if the majority of submissions pass (and at MongoDB, 85% of them do). However we’d like to guarantee that an unusually bad day doesn’t make the machines running the tests do an excessive amount of unneeded work. To make this guarantee possible, we inserted a checkpoint into the queue, so that only the batch of submissions in front of the checkpoint are running tests. In the example of 10 submissions to the queue, placing the checkpoint after submission No. 3 would mean that the first three submissions start running tests while submissions No. 4 and later wait until the first three finish. It’s totally possible that everything still fails, but adding this checkpoint prevents us from doing too much extra work. With the checkpoint, a queue of length n would run: total submissions, where f is the position of the checkpoint, \ is the integer division operator, and % is the modulus operator. If engineers add 10 submissions to the queue and the checkpoint is after submission No. 3, this hybrid approach would run tests as many as 19 times, compared with 55 with a fully parallel and 10 with a fully serial approach. The following infographic helps visualize this example. Colors represent the current status of the submission: green means successful, red means failed, yellow means in progress, and gray means not yet started. Results The graph below depicts the average length of time a submission would wait before it started running tests for a representative week when processing submissions serially. Contrast these times with the graph below, which shows a representative week with the hybrid approach. For the depicted weeks, the overall average time dropped from 1,238 seconds with the serial approach to 469 seconds with the hybrid approach — a reduction of 62%. Conclusion With this hybrid approach of parallelizing the longest-running parts of the system but keeping key parts serial, we were able to reap the benefits of each approach. We saw drastic reductions in wait times while still maintaining the concept of an ordering for our commit queue. What led us to this approach were the requirements that the result should be noticeably faster for typical sizes of the problem (a queue with one to nine submissions), but could not be drastically slower in the worst-case scenario. These two guiding principles will often yield to designs that work well in real-world scenarios, even though they may not handle all edge cases gracefully.

April 27, 2021
Developer

Accelerate Data Modernization with Infosys Data Model Converter

Are you in the process of migrating applications from a relational database to MongoDB? If so, you’re likely trying to best understand and decide how your enterprise data needs to be modeled. Our previous blog discussed how Infosys Data Services Suite can help enterprises move data seamlessly from legacy relational databases to MongoDB. But moving data is only one part of the puzzle. The more significant step is choosing the target data model, or schema design, a process that usually requires several hours of highly skilled talent. That’s why we created this follow-up blog to help you get started. Rethinking Schema Design Ultimately, schema design can be the difference between an inefficient, disorganized database and a strategic one that empowers the entire company. Schema design in MongoDB requires a change in perspective for data architects, developers, and database administrators. They have to: Rethink the legacy relational data model. This model flattens data into rigid two-dimensional tabular structures of rows and columns. The new data model is a rich and dynamic one with embedded sub-documents and arrays Rethink how the data platform works. In relational databases, it is extremely difficult to change the data platform as the application evolves. However, in MongoDB, the apps and APIs come first and the data platform dynamically accommodates the data Getting Schema Design Right Begin the schema design process by considering the application’s requirements. You’ll want to model the data in a way that leverages the flexibility of the document model. In schema migrations, it may seem easy at first to simply mirror the flat schema of the relational database in the document model. However, this negates the advantages enabled by the rich and embedded data structures of the document model. For example, data that belongs to a parent-child relationship in two RDBMS tables can be collapsed (embedded) into a single document in MongoDB. The application data access patterns should also drive schema design with a specific focus on: The read/write ratio of database operations and whether it is more important to optimize the performance of one operation over another The types of queries and updates performed by the databases The lifecycle of the data and growth rate of documents Simplifying Schema Design with Infosys Data Model Converter Infosys has developed a solution called Infosys Data Model Convertor that processes source relational schema and the above-mentioned signals as inputs and automatically provides target MongoDB schema suggestions. Infosys Data Model Converter is available as part of Infosys Modernization Suite which accelerates enterprises’ modernization journey. Each schema suggestion is accompanied by a detailed analysis report. The data modeler can use this as a starting point and iterate over the schema to arrive at the final MongoDB schema. The Infosys Data Model Converter reduces 50-60% of the effort typically spent in schema design. Key Features Boosts productivity by augmenting the migration of RDBMS to NoSQL database Saves time by automatically extracting schema, query and data patterns from an existing RDBMS Comprehensively analyzes the RDBMS entity relations, data and read-and-write patterns Applies a rich set of rules and generates a fully compliant NoSQL target state data model Offers flexibility by externalizing the rules for organization-specific customizations Connects and deploys the model to the target NoSQL platform with sample data Discover more ways in which Infosys can help you unlock value from modernization. Contact us for any modernization questions.

April 15, 2021
Developer

Introducing: Atlas Operator for Kubernetes

The MongoDB Enterprise Operator serves to automate and manage MongoDB clusters on self-managed infrastructure. While this integration has provided complete control over self-managed MongoDB deployments from a single Kubernetes control plane, we’re taking it a step further by extending this functionality to our fully-managed database—MongoDB Atlas. We’re excited to introduce the trial version of the Atlas Operator for Kubernetes. The Atlas Operator will allow you to manage all your MongoDB Atlas clusters without ever having to leave Kubernetes. Keep your workflow as seamless and optimized as possible by managing the lifecycle of your cloud-native applications from where you want most. With the trial version of this Atlas Operator, you can provision and deploy fully-managed MongoDB Atlas clusters on the cloud provider of your choice through Kubernetes. This provider is especially important for those seeking to unlock the power of multi-cloud with unique tools and services native to AWS, Google Cloud, and Azure without any added complexity to the data management experience. With this new Atlas Operator, you get the best of all clouds with multi-cloud clusters on Atlas , coupled with the freedom to run your entire stack anywhere, all while managed in one central location. The “trial version” simply means it has all the core functionality to provision fully-managed Atlas clusters, but the bells and whistles are yet to come. In addition to encapsulating core Atlas functionality, it ensures Kubernetes Secrets are created for each database user which allows for easier management of sensitive data. The Atlas Operator also allows you to create IP Bindings so your applications can securely access clusters. If you’re interested in using the trial version of the Atlas Operator today, follow our quickstart guide below to get started! Quickstart Below you’ll find the steps to create your first cluster in Atlas using the Atlas Operator. Note that you need to have a running Kubernetes cluster before deploying the Atlas Operator. Register/Login to Atlas and create API Keys for your Organization. This information together with the Organization ID will be used to configure the Atlas Operator access to Atlas. Deploy the Atlas Operator kubectl apply -f \ https://raw.githubusercontent.com/mongodb/mongodb-atlas-kubernetes/main/deploy/all-in-one.yaml Create a Secret containing connection information from step one. This Secret will be used by the Atlas Operator to connect to Atlas: kubectl create secret generic mongodb-atlas-operator-api-key \ --from-literal="orgId=<the_atlas_organization_id>" \ --from-literal="publicApiKey=<the_atlas_api_public_key>" \ --from-literal="privateApiKey=<the_atlas_api_private_key>" \ -n mongodb-atlas-system Create AtlasProject Custom Resource: cat <<EOF | kubectl apply -f - apiVersion: atlas.mongodb.com/v1 kind: AtlasProject metadata: name: my-project spec: name: Test Atlas Operator Project projectIpAccessList: - ipAddress: "0.0.0.0/0" comment: "Allowing access to database from everywhere (only for Demo!)" EOF Create AtlasCluster Custom Resource cat <<EOF | kubectl apply -f - apiVersion: atlas.mongodb.com/v1 kind: AtlasCluster metadata: name: my-atlas-cluster spec: name: "Test-cluster" projectRef: name: my-project providerSettings: instanceSizeName: M10 providerName: AWS regionName: US_EAST_1 EOF (You'll have to wait until the cluster is ready - "status" field shows "ready:true":) kubectl get atlasclusters my-atlas-cluster -o=jsonpath='{.status.conditions[?(@.type=="Ready")].status}' True Create a Secret for the password that will be used to log into Atlas Cluster Database kubectl create secret generic the-user-password \ --from-literal="password=P@@sword%" Create AtlasDatabaseUser Custom Resource (references the password Secret) cat <<EOF | kubectl apply -f - apiVersion: atlas.mongodb.com/v1 kind: AtlasDatabaseUser metadata: name: my-database-user spec: roles: - roleName: "readWriteAnyDatabase" databaseName: "admin" projectRef: name: my-project username: theuser passwordSecretRef: name: the-user-password EOF Shortly the Secret will be created by the Atlas Operator containing the data necessary to connect to the Atlas Cluster. You can mount it into your application Pod and read the connection strings from the file or from the environment variable. kubectl get secrets/test-atlas-operator-project-test-cluster-theuser \ -o=jsonpath="{.data.connectionString.standardSrv}} | base64 -d mongodb+srv://theuser:P%40%40sword%25@test-cluster.peqtm.mongodb.net Stay Tuned for More Be on the lookout for updates in future blog posts! The trial version of the MongoDB Atlas Operator is currently available on multiple marketplaces, but we’ll be looking to make enhancements in the near future. For more information, check out our MongoDB Atlas & Kubernetes GitHub page and our documentation .

April 8, 2021
Developer

MongoDB Connector for Apache Kafka 1.5 Available Now

Today, MongoDB has released version 1.5 of the MongoDB Connector for Apache Kafka! This article highlights some of the key features of this new release in addition to continuing to improve the overall quality & stability of the Connector . DeleteOne write model strategy When messages arrive on Kafka topics, the MongoDB Sink Connector reads them and by default will upsert them into the MongoDB cluster specified in the sink configuration. However, what if you didn’t want to always upsert them? This is where write strategies come in and provide you with the flexibility to define what you want to do with the document. While the concept of write strategies is not new to the connector, in this release there is a new write strategy available called DeleteOneBusinessKeyStrategy . This is useful for when a topic contains records identifying data that should be removed from a collection in the MongoDB sink. Consider the following: You run an online store selling fashionable face masks. As part of your architecture, the website sends orders to a Kafka topic, “web-orders” which upon message arrival kicks off a series of actions such as sending an email confirmation, and inserting the order details into an “Orders” collection in a MongoDB cluster. A sample Orders document: { _id: ObjectId("6053684f2fe69a6ad3fed028"), 'customer-id': 123, 'order-id': 100, order: { lineitem: 1, SKU: 'FACE1', quantity: 1 } } This process works great, however, when a customer cancels an order, we need to have another business process to update our inventory, send the cancellation, email and remove the order from our MongoDB sink. In this scenario a cancellation message is sent to another Kafka topic, “canceled-orders”. For messages in this topic, we don’t just want to upsert this into a collection, we want to read the message from the topic and use a field within the document to identify the documents to delete in the sink. For this example, let’s use the order-id key field and define a sink connector using the DeleteOneBusinessKeyStrategy as follows: "connector.class": "com.mongodb.kafka.connect.MongoSinkConnector", "topics":"FaceMaskWeb.OrderCancel", "connection.uri":"mongodb://mdb1", "database":"FaceMaskWeb", "collection":"Orders", "writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.DeleteOneBusinessKeyStrategy", "document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy", "document.id.strategy.partial.value.projection.type": "AllowList", "document.id.strategy.partial.value.projection.list": "order-id", "value.converter":"org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable":false, "document.id.strategy.overwrite.existing": true Now when messages arrive in the “FakeMaskWeb.OrderCancel” topic, the “order-id” field is used to delete documents in the Orders collection. For example, using the sample document above, if we put this value into the OrderCancel topic { “order-id”: 100 } It would cause the document in the Orders collection with order-id and value 100 to be deleted. For a complete list of write model strategies check out the MongoDB Kafka Connector Sink documentation . Qlik Replicate Qlik Replicate is recognized as an industry leader in data replication and ingestion. With this new release of the Connector, you can now replicate and stream heterogeneous data from data sources like Oracle, MySQL, PostGres and others to MongoDB via Kafka and the Qlik Replicate CDC handler . To configure the MongoDB Connector for Apache Kafka to consume Qlik Replicate CDC events, use “com.mongodb.kafka.connect.sink.cdc.qlik.rdbms.RdbmsHandler” as the value for the change data capture handler configuration parameter. The handler supports, insert, refresh, read, update and delete events. Errant Record Reporting Kafka Connect, the service which manages connectors that integrate with a Kafka deployment, has the ability to write records to a dead letter queue (DLQ) topic if those records could not be serialized or deserialized. Starting with Apache Kafka version 2.6, there was added support for error reporting within the sink connectors. This gives sink connectors the ability to send individual records to the DLQ if the connector deems the records to be invalid or problematic. For example, if you are projecting fields in the sink that do not exist in the kafka message or if your sink is expecting a JSON document and the message arrives in a different format. In these cases an error is written to the DLQ versus failing the connector. Various Improvements As with every release of the connector, we are constantly improving the quality and functionality. This release is no different. You’ll also see pipeline errors now showing up in the connect logs, and the sink connector can now be configured to write to the dead letter queue! Next Steps Download the latest MongoDB Connector for Apache Kafka 1.5 from the Confluent Hub ! Read the MongoDB Connector for Apache Kafka documentation . Questions/Need help with the connector? Ask the Community . Have a feature request? Provide Feedback or a file a JIRA .

April 7, 2021
Developer

Dive Deeper into Chart Data with New Drill-Down Capability

With the latest release of MongoDB Charts, you’re now able to dive deeper into the data that’s aggregated in your visualizations. At a high level, we generally create charts, graphs and visualizations of our data to answer questions about our business or products. Oftentimes, we need to “double click” on those visualizations to get insight into each individual data point that makes up the line, bar, column, etc. How the drill-down functionality works: Step 1: Right click on the data point you are interested in drilling down into Step 2: Click "show data for this item" Step 3: View the data in tabular or document format Each view can be better for different circumstances. For data without too many fields or no nested arrays, it might be quicker and more easily viewed in a table. On the other hand, the JSON view allows you to explore the structure of documents and click into arrays. Scenarios where more detailed information can help: Data visualization use cases are relatively broad spanning, but oftentimes they fall into 3 main categories: monitoring data, finding insights, and embedding analytics into applications. I’ll be focusing on the first two of these three as there are many different ways you could potentially build drilling-down into data via embedded charts. (Read more about our click events and embedded analytics ). For data or performance monitoring purposes , we're not speaking so much about the performance of your actual database and its underlying infrastructure, but the performance of the application or system built on top of the database. Imagine I have an application or website that takes reviews, if I build a chart like the one below where I want to easily see when an interaction hits a threshold that I want to dive deeper into, I now have the ability to quickly see the document that created that data point. This chart shows app ratings given after a user session in an app. For this example, we want to dive into any rating that was below a 3 (out of 5). This scatter plot shows I have two such ratings that cross that threshold. With the drill-down capability, I can easily see all the details captured in that user session. For finding new insights, let’s imagine I’m tracking how many transactions happen on my ecommerce site over time. In the column chart below, you can see I have purchases by month for the last year and a half (note, there’s a gap because this example is for a seasonal business!). Just by glancing at the chart, I can quickly see purchases have increased over time, and my in-app purchases have increased my overall sales. However, I want to see more about the documents that were aggregated to create those columns, so I can quickly see details about the transaction amount and location without needing to create another chart or dashboard filter. In both examples, I was able to answer a deeper level question that the original chart couldn’t answer on it’s own. We hope this new feature helps you and your stakeholders get more out of MongoDB Charts, regardless if you’re new to it or have been visualizing your Atlas data with it for months, if not years! If you haven’t tried Charts yet, you can get started for free by signing up for a MongoDB Atlas and deploying a free tier cluster.

April 6, 2021
Developer

Atlas as a Service

Many of our customers provide MongoDB as a service to their development teams, where developers can request a MongoDB database instance and receive a connection string and credentials in minutes. As those customers move to MongoDB Atlas , they are similarly interested in providing the same level of timely service to their developers. Atlas has a very powerful control plane for provisioning clusters. In large organizations that have thousands of developers, however, it is not always practical to give so many people direct access to that interface. The goal of this article is to show how the Atlas APIs can be used to provide MongoDB as a service when MongoDB is managed by Atlas. Specifically, we’ll demonstrate how an interface could be created that offers developers a set of choices for a MongoDB database instance. For simplicity, this represents how to provide developers with a list of memory and storage options to configure their cluster. Other options like cloud provider and region are abstracted away. We also demonstrate how to add labels to the Atlas clusters, which is a feature that the Atlas UI doesn’t support. For example, we’ve added a label for cluster description. Architecture Although the Atlas APIs could be called directly from the client interface, we elected to use a 3 tier architecture, the benefits being: We can control the extent of functionality to what is needed. We can simplify the APIs exposed to the front-end developers. We can more granulary secure the API endpoints. We could take advantage of other backend features such as Triggers , Twilio integration, etc. Naturally, we selected Realm to host the middle tier. Implementation Backend Atlas API The Atlas APIs are wrapped in a set of Realm Functions . For the most part, they all call the Atlas API as follows (this is getOneCluster): /* * Gets information about the requested cluster. If clusterName is empty, all clusters will be fetched. * See https://docs.atlas.mongodb.com/reference/api/clusters-get-one * */ exports = async function(username, password, projectID, clusterName) { const arg = { scheme: 'https', host: 'cloud.mongodb.com', path: 'api/atlas/v1.0/groups/' + projectID +'/clusters/' + clusterName, username: username, password: password, headers: {'Content-Type': ['application/json'], 'Accept-Encoding': ['bzip, deflate']}, digestAuth:true }; // The response body is a BSON.Binary object. Parse it and return. response = await context.http.get(arg); return EJSON.parse(response.body.text()); }; You can see each function’s source on GitHub . MiniAtlas API The next step was to expose the functions as endpoints that a frontend could use. Alternatively, we could have called the functions using the Realm Web SDK , but we elected to stick with the more familiar REST protocol for our frontend developers. Using Realm Third-Party Services , we developed the following 6 endpoints: table, th, td { border: 1px solid black; border-collapse: collapse; } API Method Type Endpoint Get Clusters GET /getClusters Create a Cluster POST /getClusters Get Cluster State GET /getClusterState?clusterName:cn Modify a Cluster PATCH /modifyCluster Pause or Resume Cluster POST /pauseCluster Delete a Cluster DELETE /deleteCluster?clusterName:cn Here’s the source for getClusters. Note is pulls the username and password from the Values & Secrets : /* * GET getClusters * * Query Parameters * * None * * Response - Currently all values documented at https://docs.atlas.mongodb.com/reference/api/clusters-get-all/ * */ exports = async function(payload, response) { var results = []; const username = context.values.get("username"); const password = context.values.get("apiKey"); projectID = context.values.get("projectID"); // Sending an empty clusterName will return all clusters. var clusterName = ''; response = await context.functions.execute("getOneCluster", username, password, projectID, clusterName); results = response.results; return results; }; You can see each webhook’s source on GitHub . When the webhook is saved, a Webhook URL is generated, which is the endpoint for the API: API Endpoint Security Only authenticated users can execute the API endpoints. The caller must include an Authorization header containing a valid user id, which the endpoint passes through this script function: exports = function(payload) { const headers = context.request.requestHeaders const { Authorization } = headers const user_id = Authorization.toString().replace(/^Bearer/, '') return user_id }; MongoDB Realm includes several built-in authentication providers including anonymous users, email/password combinations, API keys , and OAuth 2.0 through Facebook , Google , and Apple ID . For this exercise, we elected to use Google OAuth , primarily because it’s already integrated with our SSO provider here at MongoDB. The choice of provider isn’t important. Whatever provider or providers are enabled will generate an associated user id that can be used to authenticate access to the APIs. Frontend The frontend is implemented in JQuery and hosted on Realm . Authentication The client uses the MongoDB Stitch Browser SDK to prompt the user to log into Google (if not already logged in) and sets the users Google credentials in the StitchAppClient . let credential = new stitch.GoogleRedirectCredential(); client.auth.loginWithRedirect(credential); Then, the user Id required to be sent in the API call to the backend can be retrieved from the StitchAppClient as follows: let userId = client.auth.authInfo.userId; And set in the header when calling the API. Here’s an example calling the createCluster API: export const createCluster = (uid, data) => { let url = `${baseURL}/createCluster` const params = { method: "post", headers: { "Content-Type": "application/json;charset=utf-8", ...(uid && { Authorization: uid }) }, ...(data && { body: JSON.stringify(data) }) } return fetch(url, params) .then(handleErrors) .then(response => response.json()) .catch(error => console.log(error) ); }; You can see all the api calls in webhooks.js . Tips We had great success using Postman Team Workspaces to share and validate the backend APIs. Conclusions This prototype was created to demonstrate what’s possible, which by now you hopefully realize is anything! The guts of the solution are here - how you wish to extend it is up to you. Learn more about the MongoDB Atlas APIs .

March 2, 2021
Developer

Ready to get Started with MongoDB Atlas?

Start Free