MongoDB Updates

The newest releases and freshest updates

Fine-Tune Relevance in MongoDB Atlas Search with Function Scoring and Synonyms

MongoDB Atlas Search is an embedded full-text search solution in MongoDB Atlas that gives developers a seamless and scalable experience for building fast, relevance-based application features. We announced its general availability last year at MongoDB.live 2020 and over the past year we’ve introduced many new features, including a visual index builder, search query tester, custom analyzers , and wildcard path queries . This year at MongoDB.live 2021 , we’re excited to highlight two new capabilities that help developers tune the relevance of search results. See how easy it is to get started with MongoDB Atlas Search in this demo video by Marcus Eagan, Senior Product Manager for Atlas Search. Building relevance into search results Understanding the behavior of your users is essential when thinking about search result relevance. People don’t always tell you what they want, and they sometimes use words or phrases that don’t match your content exactly. To cover these scenarios, you can use full-text search features like function scoring and synonyms. Influence search rankings with function scoring There are often multiple factors that influence how search results should be ranked. For example, let’s say you have a restaurant finder application. The explicit inputs are things like the user’s location and what they’re searching for, but what’s implied is that they likely want to see highly rated restaurants or ones with more reviews. What’s Cooking: a sample restaurant finder application using MongoDB Atlas Search Function scoring allows you to influence the order of results returned by manipulating the score of each result. In Atlas Search, that means you can use a numeric field in a document and apply a mathematical expression to it. For example, you might want to increase the score of restaurants that are sponsored or have higher star ratings. This can easily be accomplished within the same search query by simply adding the function option to the score parameter of your query. Learn more about how to use function scores in our developer tutorial . Show results for more search queries with synonyms Synonyms are often used to define terms that are semantically similar to each other to improve search results. For example, someone searching for “noodles” might want to find results for “spaghetti”, “chow mein”, or “pad thai”. Synonyms can also help with typos, especially on mobile and small keyboards. In Atlas Search, you can define collections of synonyms for a search index via the API. Synonyms can be explicit (one-way) or equivalent (two-way). Explicit synonyms are good for defining relationships between terms that are subsets of each other, like the noodle example above: “spaghetti”, “chow mein”, and “pad thai” are all explicit synonyms for “noodles”, but not each other (you don’t want results for “chow mein” in a search for “spaghetti”). Equivalent synonyms are often used for terms that have regional variations or are otherwise interchangeable both ways, like soda and pop, or Kleenex and tissues. What's next for Atlas Search Developers are increasingly turning to full-text search to make content more discoverable and relevant for application end users. With Atlas Search, we hope to not only make building full-text search easier, but also more powerful and expressive. Join our community to ask questions and find out what other developers are building with Atlas Search and let us know what you think we should build next in our feedback forums .

July 13, 2021
Updates

Introducing Serverless Instances on MongoDB Atlas, Now Available in Preview

Since we first launched MongoDB Atlas in June 2016, we’ve been working towards building a cloud database that not only delivers a first-class developer experience, but also simply just works: no setup, tuning, or maintenance required. Over the years, this has led to features like auto-scaling and click-to-create index suggestions , along with numerous optimizations to our automation engine. We’re excited to announce that we’re one more step closer to realizing this vision with the introduction of serverless databases on MongoDB Atlas . Think less about your database, and more about your data Serverless computing and NoOps have emerged as popular trends in modern application development. Cloud functions are commonly used to power business logic in applications, and many teams rely on completely automated IT operations. The appeal of serverless technology is hard to deny: elastic scaling eliminates the need for upfront resource provisioning and ongoing maintenance, and consumption-based pricing means paying only for resources that are used. It abstracts and automates away many of the lower-level infrastructure decisions that developers don’t want to have to learn or manage so they can focus on building differentiated features. When it comes to databases, compute and storage resources have traditionally been tightly coupled. Applying a serverless model to databases means decoupling them and changing the way engineering teams think about infrastructure. Rather than asking a developer to predict an application’s future workload patterns, break them down into individual resource requirements, and then map them to arbitrary units of database instance sizes, serverless databases offer a much simpler experience: define where your data lives, and get a database endpoint you can use. This not only streamlines the database deployment process, it also eliminates the need to monitor and adjust capacity on an ongoing basis. Developers are free to focus on thinking about their data rather than their databases, and leave the lower-level infrastructure decisions to intelligent, behind-the-scenes automation. Serverless instances on MongoDB Atlas All customers now have the ability to create a serverless database on MongoDB Atlas with the introduction of serverless instances , announced at MongoDB.live 2021 . It’s incredibly easy to get started: simply choose a cloud region and you’ll receive an on-demand database endpoint for your application. Serverless instances always run on the latest MongoDB version so you never have to worry about backwards compatibility or upgrades. You can view and manage them using the same UI and API as your existing database deployment on Atlas (i.e., clusters), and they come with end-to-end security, continuous uptime, metrics, alerts, and backups. Watch this demo of how to create a serverless instance on MongoDB Atlas This new deployment type will be available in preview, so it doesn’t yet support all of the features and capabilities available on clusters today. It’s ideal for infrequent or sparse workloads, or development and testing workloads in the cloud. If you’re running a high-throughput production workload, dedicated clusters are still the recommended deployment option. A hands-free database experience This is the first of many releases, and we have an ambitious roadmap ahead. We will continue to invest in making working with data ever more seamless and delightful for developers, from adding support for newer Atlas capabilities like full-text search and native visualizations , to even more intelligent automation and optimization. Create your own serverless instance on MongoDB Atlas. Try the Preview If you have feedback or questions, we’d love to hear them! Join our community forums to meet other MongoDB developers and see what they’re building with serverless instances. What's next for MongoDB Atlas Serverless instances are just one of many new additions to Atlas that we hope will make developers’ lives easier. Earlier this year, we added index removal suggestions to Performance Advisor and released a quick start for creating and managing clusters via the command line with the MongoDB CLI . We are also working on integrations with Vercel and Netlify , two popular serverless application platforms, to give developers an easy way to get started on MongoDB Atlas. What would make your development experience better on MongoDB Atlas? Share your feature requests in our feedback forums .

July 13, 2021
Updates

Streaming Time-Series Data Using Apache Kafka and MongoDB

There is one thing the world agrees on and it is the concept of time. Many applications are heavily time-based. Consider solar field power generation, stock trading, and health monitoring. These are just a few of the plethora of applications that produce and use data that contains a critical time component. In general, time-series data applications are heavy on inserts, rarely perform updates and are even more unlikely to delete the data. These applications generate a tremendous amount of data and need a robust data platform to effectively manage and query data. With MongoDB, you can easily: Pre-aggregate data using the MongoDB Query language and window functions Optimally store large amounts of time-series data with MongoDB time-series collections Archive data to cost effective storage using MongoDB Atlas Online Archive Apache Kafka is often used as an ingestion point for data due to its scalability. Through the use of the MongoDB Connector for Apache Kafka and the Apache Kafka Connect service, it is easy to transfer data between Kafka topics and MongoDB clusters. Starting in the 1.6 release of the MongoDB Connector for Apache Kafka, you can configure kafka topic data to be written directly into a time-series collection in MongoDB. This configuration happens in the sink. Configuring time series collections in the sink With MongoDB, applications do not need to create the database and collection before they start writing data. These objects are created automatically upon first arrival of data into MongoDB. However, a time-series collection type needs to be created first before you start writing data. To make it easy to ingest time-series data into MongoDB from Kafka, these collection options are exposed as sink parameters and the time-series collection is created by the connector if it doesn’t already exist . Some of the new parameters are defined as follows: timeseries.timefield Name of the top level field used for time. timeseries.expire.after.seconds This optional field determines the amount of time the data will be in MongoDB before being automatically deleted. Omitting this field means data will not be deleted automatically. If you are familiar with TTL indexes in MongoDB, setting this field provides a similar behavior. timeseries.timefield.auto.convert This optional field tells the connector to convert the data in the field into a BSON Date format. Supported formats include integer, long, and string. For a complete list of the new time-seris parameters check out the MongoDB Sink connector online documentation . When data is stored in time-series collections, MongoDB optimizes the storage and bucketization of your data behind the scenes. This saves a tremendous amount of storage space compared to the typical one document per data point data structure in regular collections. You can also explore the many new time and window functionalities within the MongoDB Query Language. For example, consider this sample document structure: { tx_time: 2021-06-30T15:47:31.000Z, _id: '60dc921372f0f39e2cd6cba5', company_name: 'SILKY CORNERSTONE LLC', price: 94.0999984741211, company_symbol: 'SCL' } You can use the new $setWindowFields pipeline to define the window of documents to perform an operation on then perform rankings, cumulative totals, and other analytics of complex time series data. For example, using the data generated in the tutorial, let’s determine the rolling average to the data as follows: db.StockDataTS.aggregate( [ { $match: {company_symbol: 'SCL'} }, { $setWindowFields: { partitionBy: '$company_name', sortBy: { 'tx_time': 1 }, output: { averagePrice: { $avg: "$price", window: { Documents: [ "unbounded", "current" ] } } } } } ]) A sample of the result set is as follows: { tx_time: 2021-06-30T15:47:45.000Z, _id: '60dc922172f0f39e2cd6cbeb', company_name: 'SILKY CORNERSTONE LLC', price: 94.06999969482422, company_symbol: 'SCL', averagePrice: 94.1346669514974 }, { tx_time: 2021-06-30T15:47:47.000Z, _id: '60dc922372f0f39e2cd6cbf0', company_name: 'SILKY CORNERSTONE LLC', price: 94.1500015258789, company_symbol: 'SCL', averagePrice: 94.13562536239624 }, { tx_time: 2021-06-30T15:47:48.000Z, _id: '60dc922472f0f39e2cd6cbf5', company_name: 'SILKY CORNERSTONE LLC', price: 94.0999984741211, company_symbol: 'SCL', averagePrice: 94.13352966308594 } Notice the additional “averagePrice” field is now populated with a rolling average. For more information on time-series collection in MongoDB check out the online documentation . Migrating existing collections To convert an existing MongoDB collection to a time-series collection you can use the MongoDB Connector for Apache Kafka. Simply configure the source connection to your existing collection and configure the sink connector to write to a MongoDB time series collection by using the “timeseries.timefield” parameter. You can configure the source connector to copy existing data by setting the “copy.existing” parameter to true. This will create insert events for all existing documents in the source. Any documents that were inserted during the copying process will be inserted once the copying process has finished. While not always possible, it is recommended to pause writes to the source data while the copy process is running. To see when it finishes, you can view the logs for the message, “Finished copying existing data from the collection(s).”. For example, consider a source document that has this structure: { company_symbol: (STRING), company_name: (STRING), price: (DECIMAL), tx_time: (STRING) } For the initial release of MongoDB Time series collections, the field that represents the time is required to be stored as a Date. In our example, we are using a string to showcase the ability for the connector to automatically convert from a string to a Date. If you chose to perform the conversion outside of the connector you could use a Single Message Transform in Kafka Connect to convert the string into a Date at the Sink. However, certain SMTs like Timestampconverter require schemas to be defined for the data in the Kafka topic in order to work. This may add some complexity to the configuration. Instead of using an SMT you can automatically convert into Dates using the new timeseries.timefield.auto.convert, and timeseries.timefield.auto.convert.date.format options. Here is a sample source configuration that will copy all the existing data from the StockData collection then continue to push data changes to the stockdata.Stocks.StockData topic: {"name": "mongo-source-stockdata", "config": { "tasks.max":"1", "connector.class":"com.mongodb.kafka.connect.MongoSourceConnector", "key.converter":"org.apache.kafka.connect.storage.StringConverter", "value.converter":"org.apache.kafka.connect.json.JsonConverter", "publish.full.document.only": true, "connection.uri":(MONGODB SOURCE CONNECTION STRING), "topic.prefix":"stockdata", "database":"Stocks", "collection":"StockData", "copy.existing":"true" }} This is a sample configuration for the sink to write the data from the stockdata.Stocks.StockData topic to a MongoDB time series collection: {"name": "mongo-sink-stockdata", "config": { "connector.class":"com.mongodb.kafka.connect.MongoSinkConnector", "tasks.max":"1", "topics":"stockdata.Stocks.StockData", "connection.uri":(MONGODB SINK CONNECTION STRING), "database":"Stocks", "collection":"StockDataMigrate", "key.converter":"org.apache.kafka.connect.storage.StringConverter", "value.converter":"org.apache.kafka.connect.json.JsonConverter", "timeseries.timefield":"tx_time", "timeseries.timefield.auto.convert":"true", "timeseries.timefield.auto.convert.date.format":"yyyy-MM-dd'T'HH:mm:ss'Z'" }} In this sink example, the connector will convert the data in the “tx_time” field into a Date and parse it expecting the string format yyyy-MM-ddTHH:mm:ssZ (e.g. '2021-07-06T12:25:45Z') Note that in the initial version of time-series collections, only insert into a time-series collection is supported. Updating or deleting documents on the source will not propagate to the destination. Also, you can not use the MongoDB CDC Handler in this scenario because the handler uses ReplaceOne which is a type of update command. These are limitations of the initial release of time-series in MongoDB and may be irrelevant by the time you read this post. Check the online documentation for the latest information. The MongoDB Connector for Apache Kafka version 1.6 is available to download from GitHub . Look for it on the Confluent Hub later this week!

July 13, 2021
Updates

The new MongoDB Shell is GA!

The new MongoDB Shell (mongosh) is now GA and becomes the default shell for the MongoDB platform. Download it now and start using it right away. Like all software we build at MongoDB, a great user experience is a major consideration. Even when working with a command-line tool, we believe it is just as important. To deliver this great experience to our users, we have redesigned the MongoDB Shell from the ground up to provide a modern command-line experience with enhanced usability features and a powerful scripting environment. After 1 year in beta, with a lot of great feedback from users and customers, we are excited to announce the general availability of the MongoDB Shell , the best way to work with your data and with your MongoDB deployments from the command line. The new MongoDB Shell is compatible with MongoDB 4.0+, so you don’t have to wait to upgrade to MongoDB 5.0 to start using it. You can download it and try it out now ! Enhanced user experience To make queries and aggregations easier to write and results easier to read, the MongoDB Shell comes with syntax highlighting. Now, it’s much easier to distinguish fields, values, and data types, which helps avoid syntax errors. If an error still occurs, the shell points you to the problem and helps you understand how to fix it. To help you type your queries and commands faster, the new MongoDB Shell includes intelligent autocomplete: based on the version of MongoDB you are connected to, the shell can suggest autocomplete options for methods, commands, and even MQL expressions. And when you don’t remember the syntax for a command, you can quickly look it up directly from the shell. Advanced scripting environment The MongoDB Shell is a great scripting environment. It’s built on top of the Node.js REPL, which means you can use the entire Node.js API in your scripts. Not only that: in your scripts for the MongoDB Shell you can now use any modules from npm. In the video below, you can see how I used node-fetch to fetch some data from a REST API and store it in MongoDB. Of course, you can also load and run scripts from the filesystem: as with the legacy mongo shell, in mongosh you can keep using load and eval to execute your scripts. Extensibility and snippets One of the goals we set for ourselves when we started building a new MongoDB Shell, was to make it easy to extend. This way, as the MongoDB platform grows with new products and services, the shell can grow with it. We also wanted to give our users and customers the possibility to extend the shell with all the functionality they need to be productive with MongoDB. While that has somewhat always been possible by loading scripts at startup with an RC file , we decided to take it one step further. In mongosh, we allow you to install Snippets. Snippets are plugins that you can install and are automatically loaded into the shell. Snippets can use any Node.js API and npm packages, allowing you to support a wide variety of use cases. We maintain a repository with a few Snippets that offer some new, interesting functionality (e.g. a snippet to analyze the schema of a given collection ) but you are free to configure mongosh to use a registry of snippets of your choice. Snippets are currently an experimental feature of mongosh. We are curious to see how you use it and get feedback so we can take it in the right direction and help you customize the shell exactly the way you need it. What happens with the legacy Mongo shell? You might be wondering what will happen with the legacy mongo shell. We are not taking it away quite yet. However, starting with MongoDB 5.0 the legacy shell is deprecated, and we encourage you to switch to mongosh as your default shell. Get started with mongosh! The new MongoDB Shell is available in our download center . Install it, connect with a MongoDB cluster and start scripting! Learn more about the MongoDB Shell and how to use it in our online documentation . If you have feedback or if you would like to suggest new features please let us know through our feedback engine .

July 12, 2021
Updates

Visualize Blended Atlas and AWS S3 Data From Atlas Data Lake with MongoDB Charts

We’re excited to announce that MongoDB Charts supports Atlas Data Lake as a data source! You can now use Charts to easily visualize data stored across different Atlas databases and AWS S3 buckets. Thanks to the aggregating power of Atlas Data Lake’s federated query, creating charts and graphs from blended application and cloud object data is simpler than ever before. On the surface this powerful integration is as simple as adding your Atlas Data Lake as a data source within Charts. However, it unlocks a deeper level of analysis while eliminating the need for creating an Extract-Transform-Load (ETL) process across your Atlas and S3 data. The integration provides the ability to visualize data from the following combination of sources without writing any code: Data from many Atlas databases or clusters, including multi-cloud clusters Cloud storage data from AWS S3 Blended Atlas and cloud storage (AWS S3) data Scenario: Finding insights from aggregated customer profile and contract data Let’s add a real world scenario of how this can enhance the analytics you derive from your data. While doing so, we will walk through the steps of setting up your Atlas Data Lake, adding it as a data source to Charts, and getting the most of your data with Charts’ powerful visualization capabilities. For context, let’s imagine we’re an analyst at a telecom company and we have contract data that is stored in MongoDB Atlas in different clusters and databases for each country we operate in - United States and Canada. Second, we have offloaded data from our Customer Relationship Management (CRM) tool as a parquet file into an AWS S3 bucket. All three datasets share a common “customerID” field. Configure Atlas Data Lake Because both “contracts” collections (or datasets) in MongoDB Atlas share the same fields, I simply mapped both into a single collection within the data lake. I mapped the customer profiles dataset into its own collection, since it only shares the “customerID” field. However, now that it’s in the same data lake, I will easily be able to join it to my contract data with a $lookup in my Charts aggregation pipeline or with a Lookup Field in the chart builder. (A $lookup in the MongoDB Query API is equivalent to a join in SQL.) Configure Charts data source I want to find insights from all contracts, both US and Canada in this scenario. Once I have created a single Atlas Data Lake collection (DL_contracts.allcontracts) from the two separate databases, I then need to add it as a data source in Charts. Simply click on “add data source” within Charts and add your data lake, and then choose the collections we want to use in the next step. For completeness I also added the two Atlas collections (US and Canada contracts) as data sources in Charts by following the same steps. Visualize data across multiple Atlas databases With Atlas Data Lake’s federated query capability, which effectively performs a union of data, I am able to build a column chart that shows the amount of all US and CA contracts in a single chart without writing any code. As you can see below, the chart shows both US and CA columns when connected to the data lake collection. When the data source is switched directly to either Atlas database, it only shows data for that respective database, or country in this example. Visualize blended data from Atlas and an AWS S3 bucket Lastly, let’s take our insights to the next level by visualizing data from multiple Atlas databases and a parquet file that’s stored in an AWS S3 bucket. Adding customer profile data that I offloaded from my CRM tool into S3 will enable me to find more robust insights. I could also visualize the data from the parquet file alone by connecting to that data lake collection. Since the contract data and customer profile data are in different collections within my Atlas Data Lake, I created a $lookup in the aggregation pipeline of the Charts data source. I then created a table chart from three different data sources with conditional formatting to quickly identify high value customers. The columns with blue boxes include contract data from both Atlas clusters, while the columns with orange boxes include customer profile data from a parquet file via AWS S3 bucket. Note, I could also aggregate the data in Atlas Data Lake and use $out to create a new collection of the data , and then connect Charts to the new collection as a data source. For the purposes of this blog, I wanted to highlight Charts-specific aggregation capabilities. We hope that you’re excited about the ability to easily visualize multiple data sources, from multiple Atlas databases to AWS S3 buckets in one place! Remember, if you haven’t used Charts before, you can get started for free by signing up for MongoDB Cloud , deploying an Atlas cluster and activating Charts. Try MongoDB Atlas for free today!

July 9, 2021
Updates

Distinguish Data, Get Insights Faster with Conditional Formatting in Charts

The latest release of MongoDB Charts adds Conditional Formatting; an exciting new feature that enables chart authors to highlight important changes in their chart data, based on a set of rules that they define. Conditional Formatting rules can be applied to table charts and number charts . Why use Conditional Formatting? For table charts, the data is densely packed into the visualisation using rows and columns. This is great for comparing many values simultaneously, but as the density increases it may become more difficult to find and focus on the data that matters. Many authors use Number charts to track key individual metrics within their data. While the number itself can be useful, sometimes it isn’t enough to convey other necessary information for its context – for instance, is a high number good or bad? Conditional Formatting can aid users in understanding the data by applying different styles based on rules to highlight what is important, and to provide them with more context. See Conditional Formatting in Action with Formula 1 Data Formula 1 motorsport is what I like to refer to as the “sport of nerds”, because analyzing and understanding huge amounts of data, and being able to quickly make a decision on that analysis can be the difference between winning and losing. So let’s see how Conditional Formatting can help with this task using data from the 2021 FIA Formula 1 World Championship. Single Color Conditional Formatting Let’s start off with something simple. Below is a table showing the 2021 Drivers Championship after three rounds. A driver’s position in the championship is determined by the total number of points they have been awarded over successive rounds of the season. Let’s edit this chart and add Conditional Formatting to highlight the top three drivers in the championship with colors to represent 1st, 2nd, and 3rd place. Click on the Customize tab , and then click on the Conditional Formatting menu to expand the accordion. As you can see we haven’t yet defined any rules, so let’s add a new rule by clicking the + Add button. A drawer will open up from the left hand side of the screen displaying the Add Format Rules view. A conditional formatting rule must have at least one condition, and all conditions must match in order for the rule to be applied. Let’s highlight the row of the driver currently in 1st place by adding a single color rule with one condition. Since this rule will be determined by the driver’s current position in the championship, we need to add a condition to act on this data. We can target this field by selecting Pos from the Applies to select control. Now that we know what field we are targeting, we must next choose an operator to use for the comparison. Since we are only interested in data that matches a specific value, we select the Equal To numeric operator. Next we must provide an input for the operand to be compared to. For this rule we are only interested in highlighting the driver that is in first place in the championship, so we enter a value of 1 into the Input . You can think of this condition as saying; “is the value of the field Pos equal to the value of 1?” If it is, then apply the styling of this rule, otherwise do not. Finally we choose what styling changes should be applied by choosing from the options under Styling . In this example, we want to highlight the background color of the cell in a gold color to signify 1st place, and we will also apply a bold font weight to the text to make it more prominent. Additionally we also would like for these styles to be applied to the entire row, and not just the cell that the condition is applicable to, so we will check the Format entire row option. And that’s it! Once we save the rule, you’ll notice that the table re-renders in the Chart Builder Preview to show that the data is being evaluated correctly and the Conditional Formatting rule is applied. We then simply rinse and repeat this process to add additional rules to highlight the drivers in 2nd and 3rd place, resulting in the following output: Color Scale Conditional Formatting When comparing tabular data, sometimes it is desirable to use color to show where each value lies relative to other values in the column. The table below shows the race results for the third round of the 2021 FIA Formula 1 World Championship. Each row displays the final result for each driver taking part in the race. Let’s compare the Average Speed of each driver’s Fastest Lap using a Color Scale Conditional Formatting rule. Navigate to the Add Format Rule screen in the same way by going to “Customize > Conditional Formatting > + Add” , but this time select the Color Scale radio option. Note: Conditional Formatting Color Scale rules can only have one condition, and this condition can only be applied to fields in a Table Chart encoded as Value columns. Select the Fastest Lap Average Speed as the target for the condition. You’ll notice that unlike the discrete Single Color rules, there are no other settings to configure for the condition. This is because a Color Scale will compare the values across the documents in sort order, and will determine a background color to apply to the cell based on the rank of the value within the range. Since we are interested in finding the highest Average Speed across each driver’s fastest lap, we will select a sequential color scale, where higher values are colored green, and lower values are colored in white. Save the rule to see the changes applied. As you can see, for the third round of the 2021 FIA Formula 1 World Championship, the fastest lap average speed was set by Valtteri Bottas at a blistering speed of 209.74 km/h (130.32 mph)! And there we have it. I hope this brief introduction to Conditional Formatting has highlighted (pun intended) the capabilities of this exciting new feature! In this post we’ve only scratched the surface of what’s possible though, Conditional Formatting has many many more powerful operators than what we have demonstrated here, including matching values by range, regular expression and even ranks. Why not take it for a test drive yourself to see what is possible? If you haven’t tried Charts yet, it’s quick, easy and free to get started. Simply sign up for MongoDB Cloud , deploy a free Atlas cluster and click Charts in the top navigation bar. You can also ask questions on the MongoDB Developer Community Forums , or suggest new or improved features using the MongoDB Feedback Engine .

May 13, 2021
Updates

Introducing: Atlas Operator for Kubernetes

The MongoDB Enterprise Operator serves to automate and manage MongoDB clusters on self-managed infrastructure. While this integration has provided complete control over self-managed MongoDB deployments from a single Kubernetes control plane, we’re taking it a step further by extending this functionality to our fully-managed database—MongoDB Atlas. We’re excited to introduce the trial version of the Atlas Operator for Kubernetes. The Atlas Operator will allow you to manage all your MongoDB Atlas clusters without ever having to leave Kubernetes. Keep your workflow as seamless and optimized as possible by managing the lifecycle of your cloud-native applications from where you want most. With the trial version of this Atlas Operator, you can provision and deploy fully-managed MongoDB Atlas clusters on the cloud provider of your choice through Kubernetes. This provider is especially important for those seeking to unlock the power of multi-cloud with unique tools and services native to AWS, Google Cloud, and Azure without any added complexity to the data management experience. With this new Atlas Operator, you get the best of all clouds with multi-cloud clusters on Atlas , coupled with the freedom to run your entire stack anywhere, all while managed in one central location. The “trial version” simply means it has all the core functionality to provision fully-managed Atlas clusters, but the bells and whistles are yet to come. In addition to encapsulating core Atlas functionality, it ensures Kubernetes Secrets are created for each database user which allows for easier management of sensitive data. The Atlas Operator also allows you to create IP Bindings so your applications can securely access clusters. If you’re interested in using the trial version of the Atlas Operator today, follow our quickstart guide below to get started! Quickstart Below you’ll find the steps to create your first cluster in Atlas using the Atlas Operator. Note that you need to have a running Kubernetes cluster before deploying the Atlas Operator. Register/Login to Atlas and create API Keys for your Organization. This information together with the Organization ID will be used to configure the Atlas Operator access to Atlas. Deploy the Atlas Operator kubectl apply -f \ https://raw.githubusercontent.com/mongodb/mongodb-atlas-kubernetes/main/deploy/all-in-one.yaml Create a Secret containing connection information from step one. This Secret will be used by the Atlas Operator to connect to Atlas: kubectl create secret generic mongodb-atlas-operator-api-key \ --from-literal="orgId=<the_atlas_organization_id>" \ --from-literal="publicApiKey=<the_atlas_api_public_key>" \ --from-literal="privateApiKey=<the_atlas_api_private_key>" \ -n mongodb-atlas-system Create AtlasProject Custom Resource: cat <<EOF | kubectl apply -f - apiVersion: atlas.mongodb.com/v1 kind: AtlasProject metadata: name: my-project spec: name: Test Atlas Operator Project projectIpAccessList: - ipAddress: "0.0.0.0/0" comment: "Allowing access to database from everywhere (only for Demo!)" EOF Create AtlasCluster Custom Resource cat <<EOF | kubectl apply -f - apiVersion: atlas.mongodb.com/v1 kind: AtlasCluster metadata: name: my-atlas-cluster spec: name: "Test-cluster" projectRef: name: my-project providerSettings: instanceSizeName: M10 providerName: AWS regionName: US_EAST_1 EOF (You'll have to wait until the cluster is ready - "status" field shows "ready:true":) kubectl get atlasclusters my-atlas-cluster -o=jsonpath='{.status.conditions[?(@.type=="Ready")].status}' True Create a Secret for the password that will be used to log into Atlas Cluster Database kubectl create secret generic the-user-password \ --from-literal="password=P@@sword%" Create AtlasDatabaseUser Custom Resource (references the password Secret) cat <<EOF | kubectl apply -f - apiVersion: atlas.mongodb.com/v1 kind: AtlasDatabaseUser metadata: name: my-database-user spec: roles: - roleName: "readWriteAnyDatabase" databaseName: "admin" projectRef: name: my-project username: theuser passwordSecretRef: name: the-user-password EOF Shortly the Secret will be created by the Atlas Operator containing the data necessary to connect to the Atlas Cluster. You can mount it into your application Pod and read the connection strings from the file or from the environment variable. kubectl get secrets/test-atlas-operator-project-test-cluster-theuser \ -o=jsonpath="{.data.connectionString.standardSrv}} | base64 -d mongodb+srv://theuser:P%40%40sword%25@test-cluster.peqtm.mongodb.net Stay Tuned for More Be on the lookout for updates in future blog posts! The trial version of the MongoDB Atlas Operator is currently available on multiple marketplaces, but we’ll be looking to make enhancements in the near future. For more information, check out our MongoDB Atlas & Kubernetes GitHub page and our documentation .

April 8, 2021
Updates

MongoDB Connector for Apache Kafka 1.5 Available Now

Today, MongoDB has released version 1.5 of the MongoDB Connector for Apache Kafka! This article highlights some of the key features of this new release in addition to continuing to improve the overall quality & stability of the Connector . DeleteOne write model strategy When messages arrive on Kafka topics, the MongoDB Sink Connector reads them and by default will upsert them into the MongoDB cluster specified in the sink configuration. However, what if you didn’t want to always upsert them? This is where write strategies come in and provide you with the flexibility to define what you want to do with the document. While the concept of write strategies is not new to the connector, in this release there is a new write strategy available called DeleteOneBusinessKeyStrategy . This is useful for when a topic contains records identifying data that should be removed from a collection in the MongoDB sink. Consider the following: You run an online store selling fashionable face masks. As part of your architecture, the website sends orders to a Kafka topic, “web-orders” which upon message arrival kicks off a series of actions such as sending an email confirmation, and inserting the order details into an “Orders” collection in a MongoDB cluster. A sample Orders document: { _id: ObjectId("6053684f2fe69a6ad3fed028"), 'customer-id': 123, 'order-id': 100, order: { lineitem: 1, SKU: 'FACE1', quantity: 1 } } This process works great, however, when a customer cancels an order, we need to have another business process to update our inventory, send the cancellation, email and remove the order from our MongoDB sink. In this scenario a cancellation message is sent to another Kafka topic, “canceled-orders”. For messages in this topic, we don’t just want to upsert this into a collection, we want to read the message from the topic and use a field within the document to identify the documents to delete in the sink. For this example, let’s use the order-id key field and define a sink connector using the DeleteOneBusinessKeyStrategy as follows: "connector.class": "com.mongodb.kafka.connect.MongoSinkConnector", "topics":"FaceMaskWeb.OrderCancel", "connection.uri":"mongodb://mdb1", "database":"FaceMaskWeb", "collection":"Orders", "writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.DeleteOneBusinessKeyStrategy", "document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy", "document.id.strategy.partial.value.projection.type": "AllowList", "document.id.strategy.partial.value.projection.list": "order-id", "value.converter":"org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable":false, "document.id.strategy.overwrite.existing": true Now when messages arrive in the “FakeMaskWeb.OrderCancel” topic, the “order-id” field is used to delete documents in the Orders collection. For example, using the sample document above, if we put this value into the OrderCancel topic { “order-id”: 100 } It would cause the document in the Orders collection with order-id and value 100 to be deleted. For a complete list of write model strategies check out the MongoDB Kafka Connector Sink documentation . Qlik Replicate Qlik Replicate is recognized as an industry leader in data replication and ingestion. With this new release of the Connector, you can now replicate and stream heterogeneous data from data sources like Oracle, MySQL, PostGres and others to MongoDB via Kafka and the Qlik Replicate CDC handler . To configure the MongoDB Connector for Apache Kafka to consume Qlik Replicate CDC events, use “com.mongodb.kafka.connect.sink.cdc.qlik.rdbms.RdbmsHandler” as the value for the change data capture handler configuration parameter. The handler supports, insert, refresh, read, update and delete events. Errant Record Reporting Kafka Connect, the service which manages connectors that integrate with a Kafka deployment, has the ability to write records to a dead letter queue (DLQ) topic if those records could not be serialized or deserialized. Starting with Apache Kafka version 2.6, there was added support for error reporting within the sink connectors. This gives sink connectors the ability to send individual records to the DLQ if the connector deems the records to be invalid or problematic. For example, if you are projecting fields in the sink that do not exist in the kafka message or if your sink is expecting a JSON document and the message arrives in a different format. In these cases an error is written to the DLQ versus failing the connector. Various Improvements As with every release of the connector, we are constantly improving the quality and functionality. This release is no different. You’ll also see pipeline errors now showing up in the connect logs, and the sink connector can now be configured to write to the dead letter queue! Next Steps Download the latest MongoDB Connector for Apache Kafka 1.5 from the Confluent Hub ! Read the MongoDB Connector for Apache Kafka documentation . Questions/Need help with the connector? Ask the Community . Have a feature request? Provide Feedback or a file a JIRA .

April 7, 2021
Updates

Global, Multi-Cloud Security at Scale with MongoDB Atlas

In October 2020, we announced the general availability of multi-cloud clusters on MongoDB Atlas . Since then, we’ve made several key improvements that allow customers to take advantage of the full breadth of MongoDB Atlas ’ best-in-class data security and privacy capabilities across clouds on a global scale. Cross-Cloud Security with MongoDB Atlas A common question we get from customers about multi-cloud clusters is how security works. Each cloud provider offers protocols and controls to ensure that data within its ecosystem is securely stored and accessed. But what happens when your data is distributed across different clouds? Don’t worry–we have you covered. MongoDB Atlas is designed to ensure that our built-in best practices are enforced regardless of which cloud providers you choose to use, from dedicated network peering connections to customer-managed keys for data encryption-at-rest and client-side field-level encryption. Private Networking to Multiple Clouds You can now create multiple network peering connections and/or private endpoints for a multi-cloud cluster to access data securely within each cloud provider. For example, say your operational workload runs on Azure, but you want to set up analytics nodes in Google Cloud and AWS so you can compare the performance of Datalab and SageMaker for machine learning. You can set up network peering connections for all three cloud providers in Atlas to allow each of your cloud environments to access cluster data in their respective nodes using private networks. For more details, take a look at our documentation on network peering architecture . Integrate with Cloud KMS for Additional Control Over Encryption Any data stored in Atlas can be encrypted with an external key from AWS KMS, Google Cloud KMS, or Azure Key Vault for an extra layer of encryption on top of MongoDB’s built-in encrypted storage engine . You can also configure client-side field level encryption (client-side FLE) with any of the three cloud key management services to further protect sensitive data by encrypting document fields before it even leaves your application ( support for Azure Key Vault and Google Cloud KMS is available in beta with select drivers ). This means data remains encrypted even while it is in memory and in-use within your live database. Even though the data is encrypted, it remains queryable by the application but is inaccessible to any administrators running the database or underlying cloud infrastructure for you. Beyond security, client-side FLE is also a great way to comply with right to erasure requests that are part of modern privacy regulations such as the GDPR or the CCPA. You simply destroy the user’s encryption key and their PII is unreadable and irrecoverable in memory, on disk, in logs, and in backups. For multi-cloud clusters, this means you can take advantage of multiple layers of encryption that use keys from different clouds. For example, you can have PII data encrypted client-side with AWS KMS keys, then stored in both an AWS and Google Cloud region on Atlas and further encrypted at rest with a key managed via Azure Key Vault. Global, Multi-Cloud Clusters on MongoDB Atlas For workloads that reach users across continents, our customers leverage Global Clusters . This gives you the unique ability to shard clusters across geographic zones and pin documents to a specific zone. Now that Atlas is multi-cloud, you can now choose from the nearly 80 available regions across all three providers, expanding the potential reach of your client applications while making it easy to comply with data residency regulations. Consider a sample scenario where you’re based in the US and want to expand to reach audiences in Europe. To comply with GPDR , you must store EU customer data within that region. With Global Clusters, you can configure a multi-cloud cluster with a US zone and an EU zone. In the US, you choose to run on AWS, but in Europe, you decide to go with Azure because it has more available regions. All of this can be configured in minutes using the Atlas UI: simply define your zones and ensure that your documents contain a location field that dictates which zone they should be stored in. For more details, follow our tutorial for how to configure a multi-cloud Global Cluster on Atlas . Future-Proof Your Applications with Multi-Cloud Clusters There are many reasons why companies are considering a multi-cloud strategy , from cross-cloud resiliency to geographical reach to being able to leverage the latest tools and services on the market. With MongoDB Atlas, you get best-in-class data security and operations and intuitive admin controls, regardless of how many cloud providers you want to use. To learn more about how to deploy a multi-cloud cluster on MongoDB Atlas, check out our step-by-step tutorial , which includes best practices for node distribution, instructions for how to test failing over to another cloud, and more. Safe Harbor The development, release, and timing of any features or functionality described for our products remains at our sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality.

April 7, 2021
Updates

Ready to get Started with MongoDB Atlas?

Start Free