MongoDB Developer

Coding with MongoDB - news for developers, tips and deep dives

Considering NoSQL? Let's Break Down Your Options

Non-relational alternatives to relational databases — usually referred to as NoSQL databases — have been rapidly gaining popularity over the past decade. In 2013, MongoDB published one of our most popular white papers, “Top 5 Considerations When Evaluating NoSQL Databases.” We have since updated that paper as the technology has evolved. MongoDB is now offering a major update, which adds two new issues organizations should include in their thinking: how a database handles data generated at the edge by mobile devices and how a database fits into a broader data platform that includes search and analytics. If you’re testing the waters of NoSQL databases, then you’re probably familiar with how they’re different from traditional relational databases. The list of things you already know about NoSQL probably looks something like this: They use a different data model and query language. They have dynamic schemas. They scale horizontally. Beyond those common features, there are significant differences among NoSQL databases. The seven areas of significant differences among your options are: Data model (document, graph, key-value, etc.) Query model Consistency and transactional model APIs Mobile data Data platform Commercial support, community strength, and lock-in From MongoDB’s point of view, the most important consideration is the data model. We popularized the document model , which supports a superset of all data models, making it useful for a wide variety of applications. Key features include the ability to index and query in any field, and the natural mapping of document data structures to objects in modern programming languages. Recent shifts in how modern applications are developed and deployed — and in the experiences they offer customers — highlight the two new considerations. Mobile use cases: Mobile applications introduce the added challenge of not always being connected to the network. Developers need a solution for keeping all their customers’ apps in sync with the back-end database, no matter where they are in the world and what kind of network connection they have. The solution also needs to scale easily and quickly as more users download an app, and support the cutting edge of mobile development technologies as they evolve. Data platform: MongoDB’s application data platform provides developers a unified interface to serve transactional and operational applications alongside search, real-time, and data lake application needs. It eliminates the overhead and friction of developers having to stitch together multiple discrete technologies into a complex architecture, each creating its own duplicated data silo — connected by fragile ETL pipelines — and accessed, secured, governed, and operationalized by different APIs and tools. For a deep dive into all the differences among NoSQL databases, download our white paper, “ Top 7 Considerations When Evaluating NoSQL Databases .”

August 2, 2021
Developer

DocumentDB, MongoDB and the Real-World Effects of Compatibility

If there’s confusion in the market for document databases, it probably has to do with how the products are marketed. AWS claims that DocumentDB, its document model database, comes “with MongoDB compatibility.” But the question of how compatible DocumentDB actually is with MongoDB is worth considering. DocumentDB merely emulates the MongoDB API while running on top of AWS’s cloud-based relational database, Amazon Aurora. And it’s an inconsistent imitator at best, because it fails 62% of MongoDB API correctness tests . Even though AWS claims compatibility with MongoDB 4.0, our tests have concluded that its emulator is a mishmash of features going back to MongoDB 3.2, which we released in 2015. The result is that DocumentDB lacks many of the features that come standard in MongoDB. We’ve already published a side-by-side comparison of the feature sets for each solution. Instead of covering the same ground here, we'll explain how some of those differences play out in real-world scenarios. DocumentDB vs. MongoDB head-to-head comparison Scaling writes, partitioning data, and sharding Native sharding enables you to scale out databases horizontally, across multiple nodes and regions. Atlas offers elastic vertical and horizontal scaling to smooth consumption. DocumentDB does not scale writes or partition data beyond a single node. In order to ensure consistency, MongoDB uses concurrency control measures to prevent multiple clients from modifying the same piece of data simultaneously. Replicate and scale beyond a single region A number of factors are driving the need to distribute workloads to different geographic regions. In some cases, it’s to reduce latency by putting data closer to where it’s being used. In other cases, it’s to store data in a specific geographic zone to help meet data localization requirements. Finally, there’s the need to ensure the availability of data when there’s an outage of an entire AWS region. The flexibility to replicate and move workloads as needed is increasingly seen as a business requirement. But by default DocumentDB limits you to just 15 replicas and constrains you to a single region. Newly introduced Global Clusters may look like an answer, but much like “MongoDB compatibility,” it’s potentially misleading. The Global Clusters feature more closely resembles multi-region replication since it only allows writes to single primaries instead of being able to write to multiple regions. It also requires manual reconfiguration to recover from failures, making it a partial solution, at best. MongoDB Atlas allows true global cluster configurations so you can deliver capabilities to all your users around the world. At a click of a button, you can place the most relevant data near local application servers across more than 80 global regions to ensure low-latency reads and writes. By being able to define a geographic location for each document, your teams are able to more easily meet local privacy and compliance measures. It’s also an insurance policy against being locked into a single public cloud provider. High resilience, rapid failover, retryable writes For critical applications, every second of downtime represents a loss of revenue, trust, and reputation. Rapid failover to a different geographic area is necessary when recovery time objectives (RTO) are measured in seconds. DocumentDB failover SLAs can be as high as two minutes, and multi-region failover is not available. With MongoDB, failover time is typically five seconds, and failover to a different region or cloud provider are also options. Write errors can be as costly as downtime. If a write to increment a field is duplicated because a dropped connection failed to notify the client that the write was executed, that extra increment can be very costly depending on what it represents. With retryable writes, a write can be sent multiple times but applied exactly once. MongoDB has retryable writes. DocumentDB doesn’t. Integrated text search, geospatial processing, graph traversals Integrated text search saves time and improves performance because you can run queries across multiple sources. With DocumentDB, data must be replicated to adjacent AWS services, which increases cost and complexity. MongoDB Atlas combines integrated text search, graph traversals, and geospatial processing features into a single API and platform. Integrated search with MongoDB Atlas helps drive end user behavior by serving up relevant results based on what users are looking for or what businesses want to direct them toward. Hedged reads Geographically distributed replica sets can also be used to scale read operations and intelligently route queries to the replica set that’s closest to the user. Hedged reads is a function that automatically routes queries to the two closest nodes (measured by ping distance), returning results from the fastest replica. This helps minimize situations where queries are waiting on a node that’s already busy. DocumentDB doesn’t offer hedged reads, and it’s more restricted in terms of the number of replica sets it allows and the ability to place workloads in different regions. MongoDB gives you more flexibility when distributing data geographically for hedged reads since it leverages all of the major public cloud providers. Online Archive Putting data in cold storage can be a death knell if accessing it again is too cumbersome or slow. With online archiving, you can tier data across fully managed databases and cloud object storage and query it through a single endpoint. Online archiving automatically archives historical data while reducing operational and transactional data storage costs without compromising on query performance. MongoDB has it. DocumentDB doesn’t. Integrated querying in the cloud Running separate queries for separate data stores can drain resources and slow queries. The best solution is being able to query and analyze data across all the different databases and storage containers at once. You can do this with integrated querying, where you run a single query to analyze live cloud data and historical data together and in-place for faster insights. With DocumentDB, you have to replicate data to adjacent AWS services. With MongoDB, you can query and analyze data across cloud datastores and MongoDB Atlas in its native format. You can also run powerful, easy-to-understand aggregations through a unified API for a consistent experience across data types. On-demand materialized views When you create aggregations, the results are usually put into a new collection every time you create it. The entire collection is regenerated each time you create the aggregation. This process consumes CPU and I/O. With the $merge stage, you can just update the generated results collection rather than rebuild it completely. $merge lets you incrementally update the collection every time you run it. To update it, all you need to do is run the aggregation again and it will update all the values in place. $merge gives you the ability to create collections based on an aggregation and update those collections efficiently. This functionality allows users to create on-demand materialized views, where the content of the output collection is incrementally updated when the pipeline is run. MongoDB has this capability. DocumentDB does not. Rich data types The decimal data type is critical for storing very large or small numbers, like financial and tax computations, where it’s necessary to emulate decimal rounding exactly. DocumentDB does not support decimal data types or, in turn, lossless processing of complex numeric data, which is a problem for financial and scientific applications. MongoDB does support rich data types like Decimal128, giving you 128 bits of high precision decimal representation. Client-side field-level encryption Client-side field-level encryption (FLE) reduces the risk of unauthorized access or disclosure of sensitive data, like personally identifiable information (PII) and protected health information (PHI). Fields are encrypted before they leave the application, which protects data while in transit over the network, in database memory, at-rest in storage, in backup repositories, and in system logs. DocumentDB does not offer client-side FLE. MongoDB’s client-side FLE provides among the strongest levels of data privacy and security for regulated workloads. Platform agility In addition to the feature sets described here, one of the biggest differences between DocumentDB and MongoDB is the degree of freedom you have to move between different platforms. AWS offers seamless movement and minimal friction between services within its own ecosystem. MongoDB makes it easy to replicate data or move workloads to any cloud provider, giving you complete flexibility within the AWS platform as well as outside of it — whether it’s a self-managed MongoDB instance on cloud infrastructure, a full on-premises deployment, or just a local development instance on an engineer’s laptop. Try MongoDB Atlas for free today!

July 16, 2021
Developer

Fine-Tune Relevance in MongoDB Atlas Search with Function Scoring and Synonyms

MongoDB Atlas Search is an embedded full-text search solution in MongoDB Atlas that gives developers a seamless and scalable experience for building fast, relevance-based application features. We announced its general availability last year at MongoDB.live 2020 and over the past year we’ve introduced many new features, including a visual index builder, search query tester, custom analyzers , and wildcard path queries . This year at MongoDB.live 2021 , we’re excited to highlight two new capabilities that help developers tune the relevance of search results. See how easy it is to get started with MongoDB Atlas Search in this demo video by Marcus Eagan, Senior Product Manager for Atlas Search. Building relevance into search results Understanding the behavior of your users is essential when thinking about search result relevance. People don’t always tell you what they want, and they sometimes use words or phrases that don’t match your content exactly. To cover these scenarios, you can use full-text search features like function scoring and synonyms. Influence search rankings with function scoring There are often multiple factors that influence how search results should be ranked. For example, let’s say you have a restaurant finder application. The explicit inputs are things like the user’s location and what they’re searching for, but what’s implied is that they likely want to see highly rated restaurants or ones with more reviews. What’s Cooking: a sample restaurant finder application using MongoDB Atlas Search Function scoring allows you to influence the order of results returned by manipulating the score of each result. In Atlas Search, that means you can use a numeric field in a document and apply a mathematical expression to it. For example, you might want to increase the score of restaurants that are sponsored or have higher star ratings. This can easily be accomplished within the same search query by simply adding the function option to the score parameter of your query. Learn more about how to use function scores in our developer tutorial . Show results for more search queries with synonyms Synonyms are often used to define terms that are semantically similar to each other to improve search results. For example, someone searching for “noodles” might want to find results for “spaghetti”, “chow mein”, or “pad thai”. Synonyms can also help with typos, especially on mobile and small keyboards. In Atlas Search, you can define collections of synonyms for a search index via the API. Synonyms can be explicit (one-way) or equivalent (two-way). Explicit synonyms are good for defining relationships between terms that are subsets of each other, like the noodle example above: “spaghetti”, “chow mein”, and “pad thai” are all explicit synonyms for “noodles”, but not each other (you don’t want results for “chow mein” in a search for “spaghetti”). Equivalent synonyms are often used for terms that have regional variations or are otherwise interchangeable both ways, like soda and pop, or Kleenex and tissues. What's next for Atlas Search Developers are increasingly turning to full-text search to make content more discoverable and relevant for application end users. With Atlas Search, we hope to not only make building full-text search easier, but also more powerful and expressive. Join our community to ask questions and find out what other developers are building with Atlas Search and let us know what you think we should build next in our feedback forums .

July 13, 2021
Developer

Introducing Serverless Instances on MongoDB Atlas, Now Available in Preview

Since we first launched MongoDB Atlas in June 2016, we’ve been working towards building a cloud database that not only delivers a first-class developer experience, but also simply just works: no setup, tuning, or maintenance required. Over the years, this has led to features like auto-scaling and click-to-create index suggestions , along with numerous optimizations to our automation engine. We’re excited to announce that we’re one more step closer to realizing this vision with the introduction of serverless databases on MongoDB Atlas . Think less about your database, and more about your data Serverless computing and NoOps have emerged as popular trends in modern application development. Cloud functions are commonly used to power business logic in applications, and many teams rely on completely automated IT operations. The appeal of serverless technology is hard to deny: elastic scaling eliminates the need for upfront resource provisioning and ongoing maintenance, and consumption-based pricing means paying only for resources that are used. It abstracts and automates away many of the lower-level infrastructure decisions that developers don’t want to have to learn or manage so they can focus on building differentiated features. When it comes to databases, compute and storage resources have traditionally been tightly coupled. Applying a serverless model to databases means decoupling them and changing the way engineering teams think about infrastructure. Rather than asking a developer to predict an application’s future workload patterns, break them down into individual resource requirements, and then map them to arbitrary units of database instance sizes, serverless databases offer a much simpler experience: define where your data lives, and get a database endpoint you can use. This not only streamlines the database deployment process, it also eliminates the need to monitor and adjust capacity on an ongoing basis. Developers are free to focus on thinking about their data rather than their databases, and leave the lower-level infrastructure decisions to intelligent, behind-the-scenes automation. Serverless instances on MongoDB Atlas All customers now have the ability to create a serverless database on MongoDB Atlas with the introduction of serverless instances , announced at MongoDB.live 2021 . It’s incredibly easy to get started: simply choose a cloud region and you’ll receive an on-demand database endpoint for your application. Serverless instances always run on the latest MongoDB version so you never have to worry about backwards compatibility or upgrades. You can view and manage them using the same UI and API as your existing database deployment on Atlas (i.e., clusters), and they come with end-to-end security, continuous uptime, metrics, alerts, and backups. Watch this demo of how to create a serverless instance on MongoDB Atlas This new deployment type will be available in preview, so it doesn’t yet support all of the features and capabilities available on clusters today. It’s ideal for infrequent or sparse workloads, or development and testing workloads in the cloud. If you’re running a high-throughput production workload, dedicated clusters are still the recommended deployment option. A hands-free database experience This is the first of many releases, and we have an ambitious roadmap ahead. We will continue to invest in making working with data ever more seamless and delightful for developers, from adding support for newer Atlas capabilities like full-text search and native visualizations , to even more intelligent automation and optimization. Create your own serverless instance on MongoDB Atlas. Try the Preview If you have feedback or questions, we’d love to hear them! Join our community forums to meet other MongoDB developers and see what they’re building with serverless instances. What's next for MongoDB Atlas Serverless instances are just one of many new additions to Atlas that we hope will make developers’ lives easier. Earlier this year, we added index removal suggestions to Performance Advisor and released a quick start for creating and managing clusters via the command line with the MongoDB CLI . We are also working on integrations with Vercel and Netlify , two popular serverless application platforms, to give developers an easy way to get started on MongoDB Atlas. What would make your development experience better on MongoDB Atlas? Share your feature requests in our feedback forums .

July 13, 2021
Developer

Streaming Time-Series Data Using Apache Kafka and MongoDB

There is one thing the world agrees on and it is the concept of time. Many applications are heavily time-based. Consider solar field power generation, stock trading, and health monitoring. These are just a few of the plethora of applications that produce and use data that contains a critical time component. In general, time-series data applications are heavy on inserts, rarely perform updates and are even more unlikely to delete the data. These applications generate a tremendous amount of data and need a robust data platform to effectively manage and query data. With MongoDB, you can easily: Pre-aggregate data using the MongoDB Query language and window functions Optimally store large amounts of time-series data with MongoDB time-series collections Archive data to cost effective storage using MongoDB Atlas Online Archive Apache Kafka is often used as an ingestion point for data due to its scalability. Through the use of the MongoDB Connector for Apache Kafka and the Apache Kafka Connect service, it is easy to transfer data between Kafka topics and MongoDB clusters. Starting in the 1.6 release of the MongoDB Connector for Apache Kafka, you can configure kafka topic data to be written directly into a time-series collection in MongoDB. This configuration happens in the sink. Configuring time series collections in the sink With MongoDB, applications do not need to create the database and collection before they start writing data. These objects are created automatically upon first arrival of data into MongoDB. However, a time-series collection type needs to be created first before you start writing data. To make it easy to ingest time-series data into MongoDB from Kafka, these collection options are exposed as sink parameters and the time-series collection is created by the connector if it doesn’t already exist . Some of the new parameters are defined as follows: timeseries.timefield Name of the top level field used for time. timeseries.expire.after.seconds This optional field determines the amount of time the data will be in MongoDB before being automatically deleted. Omitting this field means data will not be deleted automatically. If you are familiar with TTL indexes in MongoDB, setting this field provides a similar behavior. timeseries.timefield.auto.convert This optional field tells the connector to convert the data in the field into a BSON Date format. Supported formats include integer, long, and string. For a complete list of the new time-seris parameters check out the MongoDB Sink connector online documentation . When data is stored in time-series collections, MongoDB optimizes the storage and bucketization of your data behind the scenes. This saves a tremendous amount of storage space compared to the typical one document per data point data structure in regular collections. You can also explore the many new time and window functionalities within the MongoDB Query Language. For example, consider this sample document structure: { tx_time: 2021-06-30T15:47:31.000Z, _id: '60dc921372f0f39e2cd6cba5', company_name: 'SILKY CORNERSTONE LLC', price: 94.0999984741211, company_symbol: 'SCL' } You can use the new $setWindowFields pipeline to define the window of documents to perform an operation on then perform rankings, cumulative totals, and other analytics of complex time series data. For example, using the data generated in the tutorial, let’s determine the rolling average to the data as follows: db.StockDataTS.aggregate( [ { $match: {company_symbol: 'SCL'} }, { $setWindowFields: { partitionBy: '$company_name', sortBy: { 'tx_time': 1 }, output: { averagePrice: { $avg: "$price", window: { Documents: [ "unbounded", "current" ] } } } } } ]) A sample of the result set is as follows: { tx_time: 2021-06-30T15:47:45.000Z, _id: '60dc922172f0f39e2cd6cbeb', company_name: 'SILKY CORNERSTONE LLC', price: 94.06999969482422, company_symbol: 'SCL', averagePrice: 94.1346669514974 }, { tx_time: 2021-06-30T15:47:47.000Z, _id: '60dc922372f0f39e2cd6cbf0', company_name: 'SILKY CORNERSTONE LLC', price: 94.1500015258789, company_symbol: 'SCL', averagePrice: 94.13562536239624 }, { tx_time: 2021-06-30T15:47:48.000Z, _id: '60dc922472f0f39e2cd6cbf5', company_name: 'SILKY CORNERSTONE LLC', price: 94.0999984741211, company_symbol: 'SCL', averagePrice: 94.13352966308594 } Notice the additional “averagePrice” field is now populated with a rolling average. For more information on time-series collection in MongoDB check out the online documentation . Migrating existing collections To convert an existing MongoDB collection to a time-series collection you can use the MongoDB Connector for Apache Kafka. Simply configure the source connection to your existing collection and configure the sink connector to write to a MongoDB time series collection by using the “timeseries.timefield” parameter. You can configure the source connector to copy existing data by setting the “copy.existing” parameter to true. This will create insert events for all existing documents in the source. Any documents that were inserted during the copying process will be inserted once the copying process has finished. While not always possible, it is recommended to pause writes to the source data while the copy process is running. To see when it finishes, you can view the logs for the message, “Finished copying existing data from the collection(s).”. For example, consider a source document that has this structure: { company_symbol: (STRING), company_name: (STRING), price: (DECIMAL), tx_time: (STRING) } For the initial release of MongoDB Time series collections, the field that represents the time is required to be stored as a Date. In our example, we are using a string to showcase the ability for the connector to automatically convert from a string to a Date. If you chose to perform the conversion outside of the connector you could use a Single Message Transform in Kafka Connect to convert the string into a Date at the Sink. However, certain SMTs like Timestampconverter require schemas to be defined for the data in the Kafka topic in order to work. This may add some complexity to the configuration. Instead of using an SMT you can automatically convert into Dates using the new timeseries.timefield.auto.convert, and timeseries.timefield.auto.convert.date.format options. Here is a sample source configuration that will copy all the existing data from the StockData collection then continue to push data changes to the stockdata.Stocks.StockData topic: {"name": "mongo-source-stockdata", "config": { "tasks.max":"1", "connector.class":"com.mongodb.kafka.connect.MongoSourceConnector", "key.converter":"org.apache.kafka.connect.storage.StringConverter", "value.converter":"org.apache.kafka.connect.json.JsonConverter", "publish.full.document.only": true, "connection.uri":(MONGODB SOURCE CONNECTION STRING), "topic.prefix":"stockdata", "database":"Stocks", "collection":"StockData", "copy.existing":"true" }} This is a sample configuration for the sink to write the data from the stockdata.Stocks.StockData topic to a MongoDB time series collection: {"name": "mongo-sink-stockdata", "config": { "connector.class":"com.mongodb.kafka.connect.MongoSinkConnector", "tasks.max":"1", "topics":"stockdata.Stocks.StockData", "connection.uri":(MONGODB SINK CONNECTION STRING), "database":"Stocks", "collection":"StockDataMigrate", "key.converter":"org.apache.kafka.connect.storage.StringConverter", "value.converter":"org.apache.kafka.connect.json.JsonConverter", "timeseries.timefield":"tx_time", "timeseries.timefield.auto.convert":"true", "timeseries.timefield.auto.convert.date.format":"yyyy-MM-dd'T'HH:mm:ss'Z'" }} In this sink example, the connector will convert the data in the “tx_time” field into a Date and parse it expecting the string format yyyy-MM-ddTHH:mm:ssZ (e.g. '2021-07-06T12:25:45Z') Note that in the initial version of time-series collections, only insert into a time-series collection is supported. Updating or deleting documents on the source will not propagate to the destination. Also, you can not use the MongoDB CDC Handler in this scenario because the handler uses ReplaceOne which is a type of update command. These are limitations of the initial release of time-series in MongoDB and may be irrelevant by the time you read this post. Check the online documentation for the latest information. The MongoDB Connector for Apache Kafka version 1.6 is available to download from GitHub . Look for it on the Confluent Hub later this week!

July 13, 2021
Developer

Visualize Blended Atlas and AWS S3 Data From Atlas Data Lake with MongoDB Charts

We’re excited to announce that MongoDB Charts supports Atlas Data Lake as a data source! You can now use Charts to easily visualize data stored across different Atlas databases and AWS S3 buckets. Thanks to the aggregating power of Atlas Data Lake’s federated query, creating charts and graphs from blended application and cloud object data is simpler than ever before. On the surface this powerful integration is as simple as adding your Atlas Data Lake as a data source within Charts. However, it unlocks a deeper level of analysis while eliminating the need for creating an Extract-Transform-Load (ETL) process across your Atlas and S3 data. The integration provides the ability to visualize data from the following combination of sources without writing any code: Data from many Atlas databases or clusters, including multi-cloud clusters Cloud storage data from AWS S3 Blended Atlas and cloud storage (AWS S3) data Scenario: Finding insights from aggregated customer profile and contract data Let’s add a real world scenario of how this can enhance the analytics you derive from your data. While doing so, we will walk through the steps of setting up your Atlas Data Lake, adding it as a data source to Charts, and getting the most of your data with Charts’ powerful visualization capabilities. For context, let’s imagine we’re an analyst at a telecom company and we have contract data that is stored in MongoDB Atlas in different clusters and databases for each country we operate in - United States and Canada. Second, we have offloaded data from our Customer Relationship Management (CRM) tool as a parquet file into an AWS S3 bucket. All three datasets share a common “customerID” field. Configure Atlas Data Lake Because both “contracts” collections (or datasets) in MongoDB Atlas share the same fields, I simply mapped both into a single collection within the data lake. I mapped the customer profiles dataset into its own collection, since it only shares the “customerID” field. However, now that it’s in the same data lake, I will easily be able to join it to my contract data with a $lookup in my Charts aggregation pipeline or with a Lookup Field in the chart builder. (A $lookup in the MongoDB Query API is equivalent to a join in SQL.) Configure Charts data source I want to find insights from all contracts, both US and Canada in this scenario. Once I have created a single Atlas Data Lake collection (DL_contracts.allcontracts) from the two separate databases, I then need to add it as a data source in Charts. Simply click on “add data source” within Charts and add your data lake, and then choose the collections we want to use in the next step. For completeness I also added the two Atlas collections (US and Canada contracts) as data sources in Charts by following the same steps. Visualize data across multiple Atlas databases With Atlas Data Lake’s federated query capability, which effectively performs a union of data, I am able to build a column chart that shows the amount of all US and CA contracts in a single chart without writing any code. As you can see below, the chart shows both US and CA columns when connected to the data lake collection. When the data source is switched directly to either Atlas database, it only shows data for that respective database, or country in this example. Visualize blended data from Atlas and an AWS S3 bucket Lastly, let’s take our insights to the next level by visualizing data from multiple Atlas databases and a parquet file that’s stored in an AWS S3 bucket. Adding customer profile data that I offloaded from my CRM tool into S3 will enable me to find more robust insights. I could also visualize the data from the parquet file alone by connecting to that data lake collection. Since the contract data and customer profile data are in different collections within my Atlas Data Lake, I created a $lookup in the aggregation pipeline of the Charts data source. I then created a table chart from three different data sources with conditional formatting to quickly identify high value customers. The columns with blue boxes include contract data from both Atlas clusters, while the columns with orange boxes include customer profile data from a parquet file via AWS S3 bucket. Note, I could also aggregate the data in Atlas Data Lake and use $out to create a new collection of the data , and then connect Charts to the new collection as a data source. For the purposes of this blog, I wanted to highlight Charts-specific aggregation capabilities. We hope that you’re excited about the ability to easily visualize multiple data sources, from multiple Atlas databases to AWS S3 buckets in one place! Remember, if you haven’t used Charts before, you can get started for free by signing up for MongoDB Cloud , deploying an Atlas cluster and activating Charts. Try MongoDB Atlas for free today!

July 9, 2021
Developer

Default Majority Write Concern: Providing Stronger Durability Guarantees "Out of the Box"

MongoDB has always provided developers with the flexibility to control the desired levels of write durability, enabling them to balance latency and throughput against the application’s SLAs. The chosen write concern dictates how many nodes in the MongoDB database cluster need to acknowledge the write before success is passed back to the application. You can configure the write concern either in the driver or via a cluster-wide, global setting in the server. Prior to the 5.0 release, MongoDB defaulted to the w:1 write concern, which waits for the write to be applied to memory on the primary node before acknowledging success back to the application. This default was adequate to meet the durability requirements of the most common applications built on MongoDB. Over the past several years, this application profile has extended as the MongoDB database has evolved, especially in serving more mission-critical, transactional applications. At the same time, we have made replication faster – so enforcing data durability across multiple nodes in a distributed database cluster no longer imposes the performance trade-offs of the past. It is for these reasons that starting with MongoDB 5.0 , the default durability guarantee has been elevated to the majority (w:majority) write concern. This means that write success will now only be acknowledged to the application once it has been committed and persisted to disk on a majority of replicas . Choosing the new default vs. the former w:1 default allows for a stronger durability guarantee where acknowledged data can survive replica set elections and complete node failures. The new w:majority default setting is fully tunable, so you can maintain the earlier w:1 default or any custom write concern you had previously configured. In this blog post, I will dig into why we have made the decision to move to w:majority as the new default, and explain how it works as you upgrade your existing MongoDB cluster and deploy a new MongoDB environment. Figure 1: MongoDB w:majority providing multi-node durability by default w:1 to w:majority default concern: A decision rooted in key MongoDB milestones The decision of making w:majority the default “out of the box” write concern is deeply rooted in our technology evolution and the experience that comes from the increased sophistication of our user’s workloads. With the release of MongoDB 4.0 (June 2018), we added support for multi-document ACID transactions . This made it extremely easy for developers to address a complete new range of modern systems of record applications with MongoDB. Transactional guarantees were initially scoped to replica sets and then extended 12 months later to sharded clusters with the MongoDB 4.2 release. In May 2019, the MongoDB Atlas default connection string used by the drivers to connect to the database was changed to w:majority. This effectively eliminated any manual tuning by Atlas users who required a higher level of durability guarantees. It also enabled us to evaluate whether users maintained those stronger guarantees, or dialed them down to the w: 1 primary acknowledged concern. The vast majority of users stuck with the stronger write concern. In MongoDB 4.4 (June 2020), we radically improved replication performance with the introduction of streaming replication and “replicate-before-journaling”. Rather than replicas polling the primary and receiving batches of events to apply locally, streaming replication allows the primary to continuously stream messages to the replicas. Coupled with replicate-before-journaling, which allows secondary nodes to read a primary node's oplog entries prior to them being locally journaled, these changes see the latency of majority committed writes reduced by up to 50% over high load networks. Throughout the years our product and engineering teams have studied the benefits of all these innovations and specifically of w:majority to our user base. One fundamental principle of w:majority is that the time taken to perform writes server-side does not really change. The added time it takes for the driver to acknowledge the write operation on the client-side strengthens the durability the application can expect for each write and improves the overall health and performance of the MongoDB cluster. This new MongoDB 5.0 w:majority default is just a reflection of how users have extended their utilization of MongoDB to modernize mission-critical applications. Upgrade considerations For users who do not set any write concern and instead rely on the defaults that MongoDB provides, w:majority will become the default write concern starting in MongoDB 5.0. This new default will take effect without any user action. The new default does not override any explicit write concerns that developers have configured previously. Developers who want their application to continue to use the write concern w:1 should explicitly set their default write concern to w:1 prior to upgrade, whether in their driver or server-side with a global write concern to maintain the previous behavior. Special considerations for MongoDB clusters with arbiters First, we should remind users that as a general practice we recommend against using arbiters in production clusters. For MongoDB clusters that do use arbiters, if the loss of a single data holding node would leave majority writes unable to succeed, then MongoDB 5.0 defaults the write concern to w:1, otherwise, it defaults to w:majority. This translates to a default write concern of w:1 for primary-secondary-arbiter (PSA) configurations , but a default write concern of w:majority for cluster configurations like PSSSA. This prevents users with configurations like PSA from experiencing write unavailability in cases where one data-bearing node is unavailable. Latency and throughput When comparing latency and throughput of w:1 and w:majority there are many application and environment-specific factors that will determine impacts to performance. With the replication enhancements delivered in MongoDB 4.4, users upgrading from earlier MongoDB releases will see little to no latency impact of the higher durability guarantees in MongoDB 5.0. Ensuring you have sufficient client concurrency should yield little to no impact on overall throughput. We will go over those factors and share the results of our performance tests in a future blog post, however, the best way to evaluate latency and throughput of different write concerns is to test in your own environment. Getting started with MongoDB 5.0 We encourage all MongoDB users to evaluate our new 5.0 release to experience the benefits of the new w:majority default and everything MongoDB 5.0 offers.

July 8, 2021
Developer

Import and Export Your Charts Dashboards

With the latest release of MongoDB Charts, we’ve added the ability to export any dashboard to a file, as well as import those files back into a Charts project. To export a dashboard, simply choose Export Dashboard from the dashboard’s tile on the main Dashboards page. To Import a dashboard, choose the command from the menu next to Add Dashboard. Let’s look at some things you can do with this new capability. Copy dashboards between projects MongoDB Cloud allows you to create multiple projects, each of which has its own Atlas cluster. There are a bunch of reasons to use multiple projects, but one common example is to use them for different environments of an application, such as Development, QA or Production. Each Charts dashboard also lives within a project, and up until now there was no way of moving or copying a dashboard between projects. This could be problematic if a dashboard that was created in the Development project needed to be promoted to QA or Production. WIth the new Import/Export feature, you can simply export a dashboard from one project and import it into another. Version control your dashboards Taking this example one step further, now that you can export your dashboards to a file, you can treat them as code. That allows you to store the dashboard definitions in a source control system, making it easy to track changes, go back to specific versions, and keep the dashboards stored safely alongside other code artefacts used in your solution. Share dashboards with the community While some dashboards only make sense when connected to your own private data, others may be built on a commonly-available schema, whether that’s the Atlas sample data , some open data from the web, or data created by a reusable script . Once you’ve built a great dashboard using this generally available data, why not export it and share it with the world? Copy dashboards and change their data sources Whenever you import a dashboard from a file, Charts will give you the opportunity to “remap” the data sources used on the dashboard. This is important because the data in the new project might not match what was in the original project. You can use this feature to your advantage if you want to quickly change the data sources used on a dashboard, even if you are importing back into the same project. As an example, suppose you are a multinational company and used a different collection to track sales in each country you operate in. You could build a dashboard with a bunch of great charts, all linked to your “US Sales” collection. If you wanted to easily build an equivalent dashboard for your Australian sales, you could simply export the US dashboard, reimport it and remap your data sources on import to the “Australian Sales” collection. Migrate from Charts on-prem Finally, this feature provides a great option for Charts on-prem users who want to move to the cloud and take advantage of all of the new features only available to cloud users. While the on-prem version of Charts does not have the Export feature, on-prem users can contact MongoDB Support to obtain a script that will generate export files for on-prem dashboards. Those files can then be imported into your MongoDB Cloud projects using the new Import feature. We hope you’re as excited about this feature as we are! Remember, if you haven’t used Charts before, you can get started for free by signing up for MongoDB Cloud , deploying an Atlas cluster and activating Charts.

June 24, 2021
Developer

Deploy and Manage MongoDB Atlas from AWS CloudFormation

As a premier launch partner for the recent GA announcement of the AWS CloudFormation Public Registry , we’re delighted to share that you can now deploy and manage MongoDB Atlas directly from your AWS environment. Amazon and MongoDB have been pioneers in the cloud computing space, providing mission critical systems for over a decade. Before MongoDB Atlas was launched in June 2016, tens of thousands of customers were running MongoDB themselves on AWS EC2 instances, and many of them were originally spun up using the legacy MongoDB on the AWS Cloud: Quick Start Reference Deployment. This Quick Start was among the top five most popular guides for AWS and allows users to take advantage of AWS CloudFormation 's seamless automation and MongoDB’s flexible data model and expressive query API. In April 2021, we launched a new AWS Quick Start for MongoDB Atlas , which allows AWS customers to quickly and easily launch a basic MongoDB Atlas deployment from the AWS CLI or console. Now, with the availability of the MongoDB Atlas resource types on the CloudFormation Public Registry, customers have more flexibility over their deployment configurations to better meet their cloud workflows. Let’s walk through how it works. Setup your AWS account for MongoDB Atlas CloudFormation Support The first step is to sign up for MongoDB Atlas , if you haven’t done so already. Once you create your account, follow these steps: Skip the cluster deployment options Go to Billing and add a credit card to your account Create an organization-level MongoDB Atlas Programmatic API Key with an IP Access List entry. The key needs Organization Project Creator permissions. Next, open the AWS console in your browser and navigate to CloudFormation. On the left-side navigation, select the Public extensions option. From there you will be able to find the MongoDB Atlas resource types by selecting the “Resource Types” and “Third Party” options. For each of the MongoDB::Atlas resource types, click “Activate”, and then follow on screen prompts to complete the process. Once you have activated the MongoDB Atlas resources in a region, you’re ready to launch apps with MongoDB Atlas directly from your AWS control plane. Build apps faster with Cloud Automation Context switching is a hassle for developers. Launching and deploying application stacks with MongoDB Atlas directly from the AWS console is now more seamless than ever. Whether you use the AWS Quick Start deployment guide as a template or create your own MongoDB Atlas CloudFormation templates, you can leverage the latest in cloud automation to reduce the pain of infrastructure provisioning and management. Try out the new MongoDB Atlas CloudFormation Resources today, and stay tuned for an in depth look at building apps with AWS Lambda and SAM CLI in an upcoming DevHub article!

June 21, 2021
Developer

Ready to get Started with MongoDB Atlas?

Start Free