MongoDB Developer

Coding with MongoDB - news for developers, tips and deep dives

10 Signs Your Data Architecture is Limiting Your Innovation: Part 1

For most businesses, the data layer is usually out of sight and out of mind. But that approach can mask the cause of major challenges facing a business — from slow time to market for new products to spiraling maintenance costs to a difficulty in focusing on innovation. The truth is, all of those issues are often rooted directly in the complexity of your data architecture. Whether you’re working with a cloud migration that has accumulated dozens of components or a legacy infrastructure that has been jerry-rigged to support modern applications, that complexity sucks up your developers’ time and resources, keeps you focused on maintenance, and creates data duplication and integration challenges that eat your budget. This complexity manifests in many different ways; as they accumulate, they can become a serious hindrance on your ability to bring innovative ideas to market. We think of the effect as a kind of tax — a tax that is directly rooted in the complexity of your data architecture. We call it DIRT — the Data and Innovation Recurring Tax. We have identified ten symptoms that can indicate your business is paying DIRT. For an in-depth view, read our white paper 10 Signs Your Data Infrastructure is Holding You Back . Symptom #1: Insights come by the month or week, not by the minute How can you serve your customers if you know nothing about them? In today’s world, insights into your customer and their needs are vital to survive, but real-time insights are what give you the competitive edge to thrive. Often, this seems to necessitate a separate database dedicated to analytics, with difficult-to-maintain ETL pipelines shuttling data between databases. But if you’re trying to match real-time insights with real-time behavior, then slow ETL pipelines put you behind before you’ve even started. And analytics databases often struggle to accommodate semi-structured, unstructured, and geospatial data. Meanwhile, the shape of your data is changing more quickly than your systems can adapt. Many organizations house their analytics in a data warehouse or a data lake. With a data warehouse or data lake, you still need to move data back and forth, introducing latency, rather than working with data where it resides. The whole process is still too slow to allow real-time analytics and power rich application experiences. Our solution: MongoDB’s application data platform solves this problem by allowing you to analyze data directly in the database in real time. Symptom #2: You can't construct a 360-degree view of your customer — or of anyone or anything What retailer wouldn’t want to have all its customer data, from clickstream to transaction history, in one place? What financial institution wouldn’t want a single view of exposure across asset classes, counterparties, and geographies? A single view, also known as a 360-degree view, can enable customer-service agents to be more helpful more quickly, because they have the data they need. A single view makes it more likely that fraud will be detected while it can still be stopped. And a single view makes it easier to comply with General Data Protection Regulation (GDPR), because you can see all your customer data in one place. Building a single view generally requires the integration of different types of data from different databases that don’t communicate. Finding one schema that will work for all the data types is extremely difficult. When you need to add new data, your old schema may not work. Our solution: Because MongoDB’s application data platform is built on the document model, it’s ideally suited to building 360-degree views. Our database supports rich customer objects, and it can accommodate any kind of data, no matter its format. These objects can be further enriched at any stage just by adding fields to a document — no schema migration necessary. For a complete view of DIRT, read our white paper DIRT and the High Cost of Complexity .

December 3, 2021

What is MACH Architecture for ecommerce?

In the past, retailers faced the looming battle of brick and mortar vs. digital buying experiences. While most in the retail industry accepted the inevitability of needing some kind of digital experience, COVID-19 forced retailers to refocus efforts to digital-first, or at the very least, hybrid digital and in-person buying options. What customers expect (and why legacy systems don't hold up) Which leads us to one of the underlying problems for modern retailers: legacy architecture. The digital solutions many depend on aren’t able to meet consumers’ digital-first (or at the very least digital-friendly) ecommerce expectations. Today’s customers expect: Mobile-friendly architecture - People shop from their phones. If your ecommerce experience was designed with web-first in mind, only retrofitting a mobile component to meet buyer demand, you may need to rethink your mobile offering. Omnichannel experience - Beyond having a mobile-friendly buying experience, consumers want to carry their purchasing power from channel to channel and even into the physical store. Think buying online and picking up in store (BOPIS), or starting an order from your phone and completing it in store, or vice versa. Dynamic product catalogues - Consumers want ample choice and a smooth search experience. Can your systems hold up with thousands of products all displayed, searchable, managed, updated, and dynamically enriched with discounts, product offerings, and more? They also expect real-time stock availability, both in store and online. They want to know you really have an item in stock at their local store before venturing out to buy it. Personalization - Personalization is so ingrained in the online retail experience now that consumers have come to expect it. They want real-time recommendations for the items they’re interested in, with predictions based on past online purchases and searches, items in their cart, and in-person buying experiences. Why is it difficult to live up to these expectations? For many in ecommerce, they’re still running monolithic applications built as a single, autonomous unit. This means even the smallest changes, like altering a single line of code or adding a new feature, could require refactoring the entire software stack, leading to downtime and lost business. In addition, the long-term opportunity cost of having your development team waste time simply maintaining and patching such a brittle ecommerce system is a constant drain, or Innovation Tax , on your business. So retailers face a unique challenge. The thought of overhauling their current systems lead to fears like downtime, expensive investments in new solutions, and ultimately, massive loss of profit. But providing an e-commerce experience that lives up to consumer expectations isn’t optional anymore; it’s how your business thrives. That’s where the MACH Approach comes in. MACH Approach: ecommerce modernization with flexibility in mind So, what’s the MACH approach and, to put it bluntly, why should the retail industry care? The MACH approach, championed by the MACH Alliance , an industry body of which MongoDB is a member, is focused on facilitating the transition from monolithic, legacy ecommerce architectures to modern, streamlined e-commerce applications. Microservices - Microservices break down specific business functionalities into smaller, self-contained services. Instead of taking your whole application offline to add new shopping cart features, you update specific elements of your architecture without disrupting the entire application. This affords developers a level of flexibility that monolithic systems can’t compete with. Greater developer flexibility means minimal downtime, faster updates, an improved experience for consumers, and ultimately faster time to value for your business. API-first - APIs, the pieces of code allowing communication between separate applications or microservices, should be at the forefront of solution development, instead of an afterthought. An API-first approach to development is just that — APIs are built first and all other actions are developed to preserve the original API for greater consistency and reusability. This approach ensures planning revolves around the end product being consumed by different devices (like mobile) and APIs will be consumed by client applications. Cloud-native - At this point, to say “the cloud is the future of app development” is cliche; we’re already there. Building and running applications exclusively in the cloud, whether public or private, allows you to reap all the benefits of cloud development from the start. There are also some cost-cutting benefits to cloud-native environments. You avoid the investment that often comes with on-prem equipment. Most cloud SaaS options have pay-as-you-go cost structures, ensuring you only pay for what you use and leading to most predictable monthly expenses. Using managed cloud solutions, like MongoDB Atlas , also frees up your development team to focus their efforts on where they’re needed most — actually developing your application — instead of sinking valuable time into burdensome administrative tasks. Headless - If your application is down, even for a minute, you run the risk of the consumer simply moving on to another retail option. Downtime equates to lost profits, so to avoid the dreaded disruption to your revenue stream, take a headless approach to application development. With headless, changes to the front end (web store layout, UX, frameworks, design, etc.) can be made without interruption to back end (products, business logic, payments , etc.) operations and vice versa. What's the upside for ecommerce? The four elements of the MACH approach come together to help ecommerce businesses reframe operations, avoid downtime, preserve revenue, provide the best user experience possible, and ultimately ensure your solutions are able to develop and evolve. To maintain a competitive advantage in a growingly competitive commerce market, your application needs to keep up. The MACH approach to ecommerce could be the ideal way to set your application and your business apart. Want to learn more about the MACH Approach and the role cloud-native database solutions like MongoDB Atlas play in the evolving world of digital retail? Get your free copy of Ecommerce at MACH Speed with MongoDB and Commercetools today.

November 30, 2021

Introducing the MongoDB Atlas Data API, Now Available in Preview

As the leading application data platform , we’re hyper-focused on accelerating and simplifying how developers leverage their application data. This has led to the introduction of features like serverless instances and the Atlas Triggers that minimize the operational burden associated with traditional database workloads. Today, we’re excited to announce the next step forward in this mission with the introduction of the MongoDB Atlas Data API – a fully managed, REST-like API for accessing your Atlas data. The Data API makes it easy to perform CRUD and aggregations on your data in minutes and allows you to query MongoDB from your backend in any language, without the need for drivers. The next level of data access Organizations are increasingly relying on operational data layers to build distributed architectures like microservices for their modern applications to speed-up development and stay competitive in rapidly changing markets. These stacks often require scalable, highly available, and secure access to the data layer. The most popular way to architect these data services is to build APIs that communicate with MongoDB data over HTTPS using REST or similar protocols. However, creating a custom-built API typically takes a lot of time and effort. It's a painful process that introduces unnecessary operational burdens like provisioning additional servers, connection management, and scaling. With the Atlas Data API, customers can generate a fully managed, REST-like API for their Atlas data in seconds. Developers no longer need to worry about the underlying infrastructuring of their APIs, and instead can enjoy the efficiency of intuitive, out-of-the box data access, while still being able to leverage the always-on and highly available qualities of Atlas as the underlying database. This unlocks a whole new level of developer productivity for use cases that were previously time consuming to accomplish – such as building data-centric microservices, simplifying access from serverless functions, and integrating with third party services . The API even has built-in support for aggregation pipelines to use with services like Atlas Search . Try the Atlas Data API All customers now have the ability to enable the Data API for their Atlas deployment. We invite you to try it out today with a new or existing Atlas account. It’s incredibly easy to get started: simply choose the cluster you’d like to connect to and generate an API key. That’s all it takes to set up and start accessing Atlas data. Have questions? Check out our documentation or head over to our community forums to get answers from fellow developers. What's next for the Atlas Data API This preview release is just the beginning. Support for services like Data Lake and Serverless Instances will be added over the coming months. And, long term, we see the Data API as the next step in our journey to abstract and automate infrastructure decisions – to help developers build the future faster. Atlas Data API documentation can be found here

November 18, 2021

Turning MongoDB into a Predictive Database

There’s a growing interest in artificial intelligence (AI) and machine learning (ML) in the business world. The predictive capabilities of ML/AI enable rapid insights from patterns detected at rates faster than manual analysis. Businesses realize that this can lead to increased profits, reduced costs, and accelerated innovation. Although businesses both large and small can benefit from the power of AI, implementing a predictive analytics project can be both complex and time-consuming. MongoDB , Inc. (NASDAQ: MDB), the leading, modern general purpose database platform, and MindsDB , the open-source machine learning platform that brings automated machine learning to the database, established a technology partnership to advance machine learning innovation. This collaboration aims to enhance the ability to streamline predictive capabilities for data science and data engineering teams within organizations to solve real-world business challenges. What is the best approach? Once you have identified the initial ML projects you’d like to focus on, choosing the right tools and methodologies can help speed up the time it takes to build, train, and optimize models. Model selection and feature engineering can be time consuming and difficult if you aren’t aware of the specific dimensions the ML model is going to train on. AutoML models excel at testing a wide variety of different algorithms to model a hypothesis of interest. Existing state-of-the-art AutoML frameworks provide methods to optimize performance including adjusting hyper parameters (such as the learning rate or batch size). The MindsDB AutoML framework extends beyond most conventional automated systems of hyper parameter tuning and enables novel upstream automation of data cleaning, data pre-processing, and feature engineering. To empower users with transparent development, the framework encompasses explainability tools, enables processing for complex data types (NLP, time series, language modeling, and anomaly detection), and gives users customizability by allowing imported models of their choice. MindsDB also generates predictions at the data layer—an additional, significant advancement that accelerates development speed. Generating predictions directly in MongoDB Atlas with MindsDB AI Tables gives you the ability to consume predictions as regular data, query these predictions, and accelerate development speed by simplifying deployment work-flows. Getting started with MindsDB We suggest starting with for a cloud managed version of MindsDB . MindsDB is an open source project (, so you can alternatively install it on your machine and run it locally. For simplicity, we recommend the docker installation described below: Install MindsDB using Docker First, check that you have docker installed by running: docker run hello-world To pull the image, run the following command: docker pull mindsdb/mindsdb Then, run the command below to start the container: docker run -p 47334:47334 -p 47336:47336 mindsdb/mindsdb If docker is not an option, you can follow our docs on how to install MindsDB locally. ( ) Setting up the connection Connecting MindsDB to MongoDB can be done in two ways: by using MindsDB Studio (the GUI) or by using Mongo clients (the terminal). Currently, integration works by accessing MongoDB through MindsDB’s MongoDB API as a new data source. More information about connecting to MongoDB can be found here . Use the Mongo shell to connect to MindsDB’s MongoDB API. Please note that you must have Mongo shell version ≥3.6 to use the MindsDB MongoDB API. If you are following this tutorial using MindsDB Cloud you can skip the section about config.json. There is a default configuration setup before starting the MongoDB API. The Mongo host will be the MindsDB Mongo API which is defined inside the host key as Please find below the config.json example. { "api": { "http": { "host": "", "port": "47334" }, "mysql": {} "mongodb": { "host": "", "port": "47336", "user": "mindsdb", "password": "", "database": "mindsdb" } }, "config_version": "1.4", "debug": true, "integrations": {}, "storage_dir": "/mindsdb_storage" } The location of the above config.json file can be found in the first output line of the log when MindsDB Server is started as a Configuration file value. If you want to change the host, default username or include password, you can make the changes there. To connect to MindsDBs via GUI: We can use MindsDB Studio to create a connection between MindsDB and MongoDB to access the data we wish to train our model on. Visit or from your favorite web browser to access the Studio. From the menu located on the left, select Database Integration. Then, select ADD DATABASE. In the connect to Database window: Select MongoDB as the Supported Database Add the subsequent information as Mongo host, port, username and password Now, we have successfully integrated with the MongoDB database. The next step is to use Mongo-client to connect to MindsDBs Mongo API and train models. To connect to MindsDBs Mongo API for local connection run: mongo --host -u "username" -p "password" If you are using MindsDB cloud, you need to use the username/password to connect to the MindsDB Mongo API. mongo --host -u "cloud_username" -p "cloud_password" Then use MindsDBs database and list collections: use mindsdb show collections Training a new Machine Learning Model using MQL We will leverage the power of Mongo Query Language (MQL) and MindsDB to train a model. The goal of the model is to predict the strength of a concrete mix, with input columns such as the age, amount of water used, types, and quantities of additives used to make the mix stronger. The dataset can be downloaded from Kaggle and represents a potential business use case in everyday construction projects to optimize the strength of a mix while minimizing the amount of material used—a goal that saves on costs without neglecting function. You can follow this tutorial with your data inside Mongodb or simply just import the csv file in a collection called material_strength. Also, you can get the exported collection from the above data on this URL . To train a new model, we need to call the insert() function on the mindsdb.predictors collection. Notably, the following information must be included: db.predictors.insert({ 'name': 'strength', 'predict': 'concrete_strength', 'connection': 'MongoIntegration', 'select_data_query':{ 'database': 'test_data', 'collection': 'material_strength', 'find': {} } }) The ‘name’ is simply the model name, ‘predict’ is the feature that we aim to predict, and ‘connection’ is the name of the MongoDB connection we have created using MindsDB Studio. Inside the select_data_query we should provide the name of the database, collection and find() function to select the data. Once you enter this information, MindsDB begins the training process. To verify that the training has been completed, you can use the find() command to check the model status inside mindsdb.predictors collection e.g.: Successful training will return a ‘status’: ‘complete’ notification. MindsDB Studio provides additional useful information to go beyond predictions and explain the results. The below figure refers to feature importances, automatically calculated and displayed to reveal which columns of your data likely matter for predictive strength. The following information can be obtained from MindsDB studio by selecting the preview option on your trained model. Moreover, the preview option also provides us with a confusion matrix to help us evaluate the performance of our model by buketizing true and predicted values. As this is a regression task, we stratify the true and predicted values to analyze how effective predictions are at reflecting the underlying data patterns. Strongly performing models have a notable diagonal component: this indicates that a model is successful at detecting the relationship between features and the output distribution. Elements located away from the main diagonal imply less accurate predictions (this could be, for example, due to sparse sampling of data in these output regions). The next step is to use the MQL to get the predictions back from the model collection. Querying the model After we have trained a model, we can go ahead and query the model. Using MQL, we will need to call the find() method on the model collection. In addition, we need to provide specific values for which we would like to obtain a prediction. An example would be: db.strength.find({'age': 28, 'superPlasticizer': 2.5, slag: 1, 'water': 162, 'fineAggregate': 1040}) The model created by MindsDB predicts a value of 17.3 with 90% confidence that the true value lies within the confidence_interval lower and upper bounds. One important piece of information is also the important_missing_information value where MindsDB suggests including values of the cement feature to the find() function will improve the prediction. This tutorial highlights the steps to create a predictive model inside MongoDB by leveraging MindsDB’s AutoML framework. Using the existing compute configuration, the example above took less than five minutes, without the need for extensive tooling, or pipelines in addition to your database. With MindsDB’s predictive capabilities inside MongoDB, developers can now build machine learning models at reduced cost, gain greater insight into model accuracy, and help users make better data-based decisions. Modernize with MongoDB and MindsDB MongoDB provides an intuitive process for data management and exploration by simplifying and enriching data. MindsDB helps turn data into intelligent insights by simplifying modernization into machine learning, AI, and the ongoing spectrum of data science. For a limited time, try MindsDB to connect to MongoDB, train models, and run predictions in the cloud. Simply sign-up here . It’s free (final pricing to be announced later this year), and our team is available on Slack and Github for feedback and support. Check it out and let us know what predictions you come up with.

November 10, 2021

100x Faster Facets and Counts with MongoDB Atlas Search: Public Preview

Today we’ve released one of the most powerful features of Atlas Search in public preview, and ready for your evaluation: lightning fast facets and counts over large data sets. Faceted search allows users to filter and quickly navigate search results by categories and see the total number of results per category for at-a-glance statistics. With the new facet operator , facet and count operations are pushed down into Atlas Search’s embedded Lucene index and processed locally – taking advantage of 20+ years of Lucene optimizations – before returning the faceted result set back to the application. What this means is that now facet-heavy workloads such as ecommerce product catalogs, content libraries, and counts run up to 100x faster . The power of facets and counts in full-text search Faceting is a popular search and analytics capability that allows an application to group information into related categories by applying filters to query results. Users can narrow their search results by simply selecting a facet value as a filter criteria. They can intuitively explore complex data sets, providing fast and convenient navigation to quickly drill into the data that is of most interest. A common use of faceting is navigating product catalogs. With travel starting to reopen, let's take a travel site as an example. By using faceted search, the site can present vacation options by destination region, trip type (i.e. hotel, self-catering, beach, ski, city break), price band, season, and more, enabling users to quickly navigate to the category that is most relevant to them. Facets also enable fast results counting. Extending our travel site example, business analysts can use facets to quickly compare sales statistics by counting the number of trips sold by region and season. Prior to the new facet operator, the only way Atlas Search could facet and count data was to retrieve the entire result set back to MongoDB’s internal $facet aggregation pipeline stage . While that was OK for smaller data sets, it became slow when the result set exceeded tens of thousands of documents. This all changes as now operations are pushed down to Atlas Search’s embedded and optimized Lucene library in a single $search pipeline stage. From our internal testing of a collection with one million documents, the new Atlas Search faceting improves performance by 100x. How to use faceting in Atlas Search Our new Atlas Search facets tutorial will help you get started. It describes how to: Create an index with a facet definition on string, date, and numeric fields in the sample_mflix.movies collection. Then run an Atlas Search query against those fields for results grouped by values for the string field and by ranges for the date and numeric fields, including the count for each of those groups. To use Atlas Search facets, you must be running your Atlas cluster on MongoDB 4.4.11 and above or MongoDB 5.0.4 and above. These clusters must be running on the M10 tier or higher. Facets and counts currently work on non-sharded collections. Support for sharded collections is scheduled for next year. The power of Atlas Search in a unified application data platform in the cloud MongoDB Atlas Search makes it easy to build fast, relevant full-text search on top of your data in the cloud. A couple of API calls or clicks in the Atlas UI, and you instantly expose your data to sophisticated search experiences that boost engagement and improve satisfaction with your applications. Your data is immediately more discoverable, usable, and valuable. By embedding the Apache Lucene library directly alongside your database, data is automatically synchronized with the search index; developers get to work with a single API; there is no separate system to run and pay for; and everything is fully-managed for you on any cloud you choose. Figure 1: Rather than bolting-on a separate search engine to your database, Atlas Search provides a fully integrated platform. Atlas Search provides the power you get with Lucene — including faceted navigation, autocomplete, fuzzy search, built-in analyzers, highlighting, custom scoring, and synonyms — combining it with the productivity you get fromMongoDB. As a result, developers can ship search applications and new features 30%+ faster. Next steps You can try out Atlas Search with the public preview of lightning-fast facets and counts today: If you are new to Atlas Search, simply spin up a cluster (M10 tier or above) and get started with our Atlas Search facets tutorial . If you are already using Atlas Search on M10 tiers and above then update your indexes to use the facet field mapping , and then start querying ! Your data remains searchable while it is being re-indexed. If you want to dig into the use cases you can serve with Atlas Search — along with users who are already taking advantage of it today — download our new Atlas Search whitepaper . Safe Harbor The development, release, and timing of any features or functionality described for our products remains at our sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality.

November 9, 2021

Sync or Stink: Why Some Apps Can't Handle Mobile Data

Mobile apps have transformed the way businesses work with data. For example, by automating and digitizing what were previously manual and error-prone tasks, mobile apps have made frontline workers more productive and the operational data they generate more actionable. But that is only true when those workers have modern mobile apps that are fully integrated with back-end systems at the core of the network. Most mobile apps are often slow to load and sync data unreliably, frequently returning stale data or just outright crashing. The reason for this is that data sync is hard. A lot of developer teams take a DIY approach to building sync. But building a sync tool that works in an offline-first environment, like mobile, can take months of complex work and require thousands of lines of code. And when development teams build mobile sync themselves, they often oversimplify the solution — so app data may only sync a few times a day or won’t sync bidirectionally. Building sync the right way — keeping data up-to-date in real-time whenever devices are connected — requires complicated networking and conflict resolution code. MongoDB Realm eliminates this complexity We cover many of the technical difficulties of developing sync tools, including the pitfalls of DIY approaches, in our recent white paper, “ Unlocking Value in Mobile Apps: 3 Best Practices for Mobile Data Sync .” Data sync is absolutely essential for delivering good customer experiences, harnessing the value of mobile data for the business, and ultimately proving that investments in mobile technology are worth it. From getting real-time information to frontline workers to tracking fleets and getting timely inventory updates, mobile data has the potential to deliver true competitive advantage, but only if you’re working with a highly performant mobile platform. When apps work right, users reward you by using the app and, hopefully, leaving positive feedback. Another good sign of a well-built app is that users ask for new features. Development teams that don’t have to worry about complicated conflict resolution code and an overly complex app architecture can focus on quickly developing new features and, in turn, creating better app experiences. MongoDB Realm and Realm Sync include pre-built conflict resolution out of the box. The Realm Mobile Database — used alongside Realm Sync — is object-oriented, so it’s intuitive to mobile developers. Realm enables developers to focus on delivering competitive differentiation instead of worrying about building complicated data sync and conflict resolution tools. This speeds development and enables teams to turn around feature requests faster. Realm Sync works alongside Realm Mobile Database to synchronize data bi-directionally between the Realm Mobile Database on the client and MongoDB Atlas on the back end. Find out more by reading “ Unlocking Value in Mobile Apps: 3 Best Practices for Mobile Data Sync .”

November 3, 2021

Real-time Applications Made Simple with MongoDB and Redpanda

MongoDB has a long history of advocating for simplicity and focusing on making developers more agile and productive. MongoDB first disrupted the database market with the document model, storing data records as BSON (binary representation of JSON documents). This approach to working with data enables developers to easily store and query their data as they use it naturally within their applications. As your data changes, you simply add an attribute to your documents and move on to the next ticket. There is no need to waste time altering tables and constraints when the needs of your application change. MongoDB is always on the lookout for more ways to make life easier for developers, such as addressing the challenges of working with streaming data. With streaming data, it may take armies of highly skilled operational personnel to build and maintain a production-ready platform (like Apache Kafka). Developers then have to integrate their applications with these streaming data platforms resulting in complex application architectures. It’s exciting to see technologies like Redpanda seeking to improve developer productivity for working with streaming data. For those unfamiliar with Redpanda, it is a Kafka API compatible streaming platform that works with the entire Kafka ecosystem, such as Kafka-Connect and popular Kafka drivers : librdkafka , kafka-python , and the Apache Kafka Java Client . Redpanda is written in C++ and leverages the RAFT protocol, which makes Apache ZooKeeper irrelevant. Also, its thread-per-core architecture and JVM-free implementation enable performance improvements over other data streaming platforms. On a side note, MongoDB also implements a protocol similar to RAFT for its replica set cluster primary and secondary elections and management. Both MongoDB and Redpanda share a common goal of simplicity and making complex tasks trivial for the developer. So we decided to show you how to pull together a simple streaming application using both technologies. The example application (found in this GitHub repository ) considers the scenario where stock ticker data is written to a Redpanda and consumed by MongoDB. Once you have the example running, a “stock generator” creates a list of 10 fictitious companies and starts writing ticker data to a Redpanda topic. Kafka Connect service listens for data coming into this topic and “sinks” the data to the MongoDB cluster. Once landed in MongoDB, the application issues an aggregation query to determine the moving averages of the stock securities and updates the UI. MongoDB consumes the ticker data and calculates the average stock price trends using the aggregation framework . Once you have downloaded the repository, a docker-compose script includes a Node server, Redpanda deployment, Kafka Connect service, and a MongoDB instance. The Kafka Connect image includes the Dockerfile-MongoConnect file to install the MongoDB Connector for Apache Kafka . The Dockerfile-Nodesvr is included in the nodesvr image and it copies the web app code & installs the necessary files via NPM. There is a script file that will launch the docker-compose script to launch the containers. To start the demo, simply run this script file via sh and upon success, you will see a list of the servers and their ports: The following services are running: MongoDB server on port 27017 Redpanda on 8082 (Redpanda proxy on 8083) Kafka Connect on 8083 Node Server on 4000 is hosting the API and homepage Status of kafka connectors: sh To tear down the environment and stop these services: docker-compose down -v Once started, navigate to localhost:4000 in a browser and click the “Start” button. After a few seconds, you will see the sample stock data from 10 fictitious companies with the moving average price. Get started with MongoDB and Redpanda This example showcases the simplicity of moving data through the Redpanda streaming platform and into MongoDB for processing. Check out these resources to learn more: Introduction to Redpanda MongoDB + Redpanda Example Application GitHub repository Learn more about the MongoDB Connector for Apache Kafka Ask questions on the MongoDB Developer Community forums Sign up for MongoDB Atlas to get your free tier cluster

October 26, 2021

Driving Innovation with MongoDB Atlas on Google Cloud

MongoDB Atlas and Google Cloud’s global partnership continues to set the standard for cloud modernization across industries. Our joint solution helps customers modernize their database stack when they migrate their infrastructure to the cloud, ultimately boosting developer productivity and reducing total cost of ownership (TCO) while giving customers access to state-of-the-art analytics and the machine learning capabilities of Google Cloud. Our partnership is producing plenty of exciting developments—read on to learn more about our joint awards, latest integrations, and what’s next. Earlier this month on October 12, MongoDB was announced as a Google Cloud Cross-Industry Customer of the Year . On the heels of an expanded, five-year partnership that began earlier in 2021, MongoDB and Google Cloud continue to create excellent customer experiences for our joint customers across the globe. Adopted by clients in the gaming, retail, healthcare, financial services, and automotive industries, among others, MongoDB Atlas on Google Cloud is experiencing exciting year-over-year growth in a variety of innovative fields. After putting much effort into launching joint integrations to become a first-class database within Google Cloud Marketplace, MongoDB was also named Google Cloud Technology Partner of the Year Award - Marketplace —for the 2nd year in a row . Since being made available on Google Cloud Console and Marketplace, MongoDB Atlas on the Marketplace has rapidly become the preferred engagement model for users. MongoDB Atlas on the Marketplace is available as a pay-as-you-go offering, and clients can now use their Google Cloud spending commitments toward Atlas to take advantage of integrated billing and support. The intuitive database structure of MongoDB combined with the cloud computing power and security of Google Cloud enables clients to take a quantum leap forward from their legacy systems. With MongoDB Atlas on Google Cloud, companies gain the power to scale effortlessly, maintain business-critical reliability, and cut costs. So, what's the latest? Today’s users expect a seamless application experience regardless of where they’re located. MongoDB Atlas delivers responsive and reliable applications worldwide alongside continuous support for new regions. Most recently, three new Google Cloud regions for Atlas dedicated clusters were released in Delhi (asia-south2), Melbourne (australia-southeast2), and Warsaw (europe-central2)—for a total of 27 supported regions, allowing customers to deploy fully managed MongoDB databases in every Google Cloud region. These regions expand the geographical coverage not just of Atlas on Google Cloud, but Atlas as a whole. The availability of Atlas in new regions will provide lower-latency access to data, offer more options for disaster recovery globally, and make it easier for customers to satisfy local regulatory and compliance requirements. Companies across the world are benefitting from our global partnership. In Japan, e-commerce analytics company PLAID has migrated to MongoDB Atlas on Google Cloud to automate operational tasks like version upgrades and backups. And in North America, the largest tire distributor on the continent, ATD, now has its own custom event-driven architecture that’s accessible, pluggable, and scalable to support the company’s expanding business horizons. ATD can now scale up for bulk one-time loads and process more than 50 million transactions in less than four hours, which is nearly 5x their real-time transactions per day. In a development that was just announced, MongoDB Atlas now allows users to deploy a serverless database on Google Cloud via serverless instances , available in preview. Serverless computing is becoming increasingly popular among developers due its undeniable benefits—including consumption-based pricing and a service that scales dynamically as needed, which eliminates the need to provision and manage capacity needs. Serverless computing abstracts and automates away many of the lower-level infrastructure decisions that consume precious developer time so they can instead focus on building differentiated features. With serverless instances, users can get started with minimal configuration—simply choose a region and give your database a name and Atlas will provide the resources your app needs. For any company that needs faster insights from data stored all over their organization, Google Cloud introduced Datastream, and MongoDB Atlas was excited to be a launch partner . Datastream is a new serverless change data capture and replication service that allows developers to easily stream data from relational databases like Oracle and MySQL to MongoDB Atlas. By making data more accessible, Datastream and MongoDB Atlas help companies make their data more actionable, and therefore more valuable. Announced in August 2021, Google Cloud’s Private Service Connect (PSC) is now generally available in all Google Cloud regions . By abstracting the underlying networking infrastructure, PSC allows users to create private and secure connections from cloud networks to services like MongoDB. PSC will allow Google Cloud customers to use MongoDB Atlas more easily, while ensuring that the connectivity is private and secure. Connecting MongoDB Atlas with Google Cloud services will make it possible for enterprises to innovate faster. Together, MongoDB and Google Cloud are developing ways for companies to accelerate digital transformation, empower employees, and raise customer satisfaction to new levels. Learn more about what your organization can do with MongoDB and Google Cloud at our first-ever joint Developer Summit on October 28. Sign up now to hear directly from your peers about how they’re using the combined power of MongoDB and Google Cloud, and participate in a live developer Q&A.

October 25, 2021

Take Advantage of Low-Latency Innovation with MongoDB Atlas, Realm, and AWS Wavelength

The emergence of 5G networking signals future growth for low-latency business opportunities. Whether it’s the ever-popular world of gaming, AR/VR, AI/ML, or the more critical areas of autonomous vehicles or remote surgery, there’s never been a better opportunity for companies to leverage low latency application services and connectivity. This kind of instantaneous communication through the power of 5G is still largely in its nascent development, but customers are adapting to its benefits quickly. New end-user expectations mean back-end service providers must meet growing demand. At the same time, business customers expect to have the ability to seamlessly deploy the same cloud-based back-end services that they’re familiar with, close to their data sources or end users. With MongoDB Realm and AWS Wavelength, you can now develop applications that take advantage of the low latency and higher throughput of 5G—and you can do it with the same tools you’re familiar with. The following blog post explores the benefits of AWS Wavelength, MongoDB Atlas, and Realm, as well as how to set up and use each service in order to build better web and mobile applications and evolve user experience. We’ll also walk through a real-world use case, featuring a smart factory as the example. Introduction to MongoDB Atlas & Realm on AWS MongoDB Atlas is a global cloud database service for modern applications. Atlas is the best way to run MongoDB on AWS because, as a fully managed database-as-a-service, it offloads the burden of operations, maintenance, and security to the world’s leading MongoDB experts while running on industry-leading and reliable AWS infrastructure. MongoDB Atlas enables you to build applications that are highly available, performant at a global scale, and compliant with the most demanding security and privacy standards. When you use MongoDB Atlas on AWS, you can focus on driving innovation and business value instead of managing infrastructure. Services like Atlas Search , Realm , Atlas Data Lake and more are also offered, making MongoDB Atlas the most comprehensive data platform in the market. MongoDB Atlas seamlessly integrates with many AWS products. Click here to learn more about common integration patterns. Why use AWS Wavelength? AWS Wavelength is an AWS Infrastructure offering that is optimized for mobile edge computing applications. Wavelength Zones are AWS infrastructure deployments that embed AWS compute and storage services within communications service providers’ (CSP) data centers. AWS Wavelength allows customers to use industry-leading and familiar AWS tools while moving user data closer to them in 13 cities in the US as well as London, UK, Tokyo and Osaka, Japan, and Daejeon, South Korea. Pairing Wavelength with MongoDB’s flexible data model and responsive Realm database for mobile and edge applications, customers get a familiar platform that can run anywhere and scale to meet changing demands. Why use Realm? Realm’s integrated application development services make it easy for developers to build industry-leading apps on mobile devices and the web. Realm comes with three key features: Cross-platform mobile and edge database Cross-platform mobile and edge sync solution Time-saving application development services 1. Mobile & edge database Realm’s mobile database is an open source, developer-friendly alternative to CoreData and SQLite. With Realm’s open source database, mobile developers can build offline-first apps in a fraction of the time. Supported languages include Swift, C#, Xamarin, JavaScript, Java, ReactNative, Kotlin, and Objective-C. Realm’s Database was built with a flexible, object-oriented data model, so it’s simple to learn and mirrors the way developers already code. Because it was built for mobile, applications built on Realm are reliable, highly performant, and work across platforms. 2. Mobile and edge sync solution Realm Sync is an out-of-the-box synchronization service that keeps data up-to-date between devices, end users, and your backend systems, all in real-time. It eliminates the need to work with REST, simplifying your offline-first app architecture. Use Sync to backup user data, build collaborative features, and keep data up to date whenever devices are online—without worrying about conflict resolution or networking code. Figure 2: High-level architecture of implementing Realm in a mobile application Powered by the Realm Mobile and Edge Database on the client-side and MongoDB Atlas on the backend, Realm is optimized for offline use and capable of scaling with you. Building a first-rate app has never been easier. 3. Application development services With Realm app development services, your team can spend less time integrating backend data for your web apps, and more time building the innovative features that push your business initiatives forward. Services include: GraphQL Functions Triggers Data access controls User authentication Reference Architecture High-level design Terminology wise, we will be discussing three main tiers for data persistence: Far Cloud, Edge, and Mobile/IOT. The Far Cloud is the traditional cloud infrastructure business customers are used to. Here, the main parent AWS regions (such as US-EAST-1 in Virginia, US-WEST-2 in Oregon, etc) are used for centralized retention of all data. While these regions are well known and trusted, the issue is that not many users or IOT devices are located in close proximity to these massive data centers and internet-routed traffic is not optimized for low latency. As a result, we use AWS Wavelength regions as our Edge Zones. An Edge Zone will synchronize the relevant subset of data from the centralized Far Cloud to the Edge. Partitioning principles are used such that users’ data will be stored closer to them in one or a handful of these Edge Wavelength Zones, typically located in major metropolitan areas. The last layer of data persistence is on the mobile or IOT devices themselves. If on modern 5G infrastructure, data can be synchronized to a nearby Edge zone with low latency. For less latency-critical applications or in areas where the Parent AWS Regions are closer than the nearest Wavelength Zone, data can also go directly to the Far Cloud. Figure 3: High Level Design of modern edge-aware apps using 5G, Wavelength, and MongoDB Smart factory use case: Using Wavelength, MQTT, & Realm Sync Transitioning from the theoretical, let’s dig one level deeper into a reference architecture. One common use case for 5G and low-latency applications is a smart factory. Here, IOT devices in a factory can connect to 5G networks for both telemetry and command/control. Typically signaling over MQTT, these sensors can send messages to a nearby Wavelength Edge Zone. Once there, machine learning and analysis can occur at the edge and data can be replicated back to the Far Cloud Parent AWS Regions. This is critical as compute capabilities at the edge, while low-latency, are not always full-featured. As a result, centralizing many factories together makes sense for many applications as it relates to long term storage, analytics, and multi-region sync. Once data is in the Edge or the Far Cloud, consumers of this data (such as AR/VR headsets, mobile phones, and more) can access this with low-latency for needs such as maintenance, alerting, and fault identification. Figure 4: High-level three-tiered architecture of what we will be building through this blog post Latency-sensitive applications cannot simply write to Atlas directly. Alternatively, Realm is powerful here as it can run on mobile devices as well as on servers (such as in the Wavelength Zone) and provide low-latency local reads and writes. It will seamlessly synchronize data in real-time from its local partition to the Far Cloud, and from the Far Cloud back or to other Edge Zones. Developers do not need to write complex sync logic; instead they can focus on driving business value through writing applications that provide high performance and low latency. For highly available applications, AWS services such as Auto Scaling Groups can be used to meet the availability and scalability requirements of the individual factory. Traditionally, this would be fronted by a load-balancing service from AWS or an open-source solution like HAProxy. Carrier gateways are deployed in each Wavelength zone and the carrier or client can handle nearest Edge Zone routing. Setting up Wavelength Deploying your application into Wavelength requires the following AWS resources: A Virtual Private Cloud (VPC) in your region Carrier Gateway — a service that allows inbound/outbound traffic to/from the carrier network. Carrier IP — address that you assign to a network interface that resides in a Wavelength Zone A public subnet An EC2 instance in the public subnet An EC2 instance in the Wavelength Zone with a Carrier IP address We will be following the “Get started with AWS Wavelength” tutorial located here . At least one EC2 compute instance in a Wavelength zone will be required for the subsequent Realm section below. The high level steps to achieve that are: Enable Wavelength Zones for your AWS account Configure networking between your AWS VPC and the Wavelength zone Launch an EC2 instance in your public subnet. This will serve as a bastion host for the subsequent steps. Launch the Wavelength application Test connectivity Setting up Realm The Realm components we listed above can be broken out into three independent steps: Set up a Far Cloud MongoDB Atlas Cluster on AWS Configure the Realm Serverless Infrastructure (including enabling sync) Write a reference application utilizing Realm 1. Deploying your Far Cloud with Atlas on AWS For this first section, we will be using a very basic Atlas deployment. For demonstration purposes, even the MongoDB Atlas Free Tier (called an M0) suffices. You can leverage the AWS MongoDB Atlas Quickstart to launch the cluster , so we will not enumerate the steps in specific detail. However, the high-level instructions are: Sign up for MongoDB Atlas account at and then sign in Click the Create button to display the Create New Database Deployment dialog Choose a “Shared” cluster, then choose the size of M0 (free) Be sure to choose AWS as the cloud and here we will be using US-EAST-1 Deploy and wait for the cluster to complete deployment 2. Configuring Realm and Realm Sync Once the Atlas cluster has completed deploying, the next step is to create a Realm Application and enable Realm Sync. Realm has a full user interface inside of the MongoDB Cloud Platform at however it also has a CLI and API which allows connectivity to CI/CD pipelines and processes, including integration with GitHub. The steps we are following will be a high-level overview of a reference application located here . Since Realm configurations can be exported, the configuration can be imported into your environment from that repository. The high level steps to create this configuration are as follows: While viewing your cluster at, click the Realm tab at the top Click “Create a New App” and give it a name such as RealmAndWavelength Choose the target cluster for sync to be the cluster you deployed in the previous step Now we have a Realm app deployed. Next, we need to configure the app to enable sync. Sync requires credentials for each sync application. You can learn more about authentication here . Our application will use API Key Authentication.To turn that on: Click Authentication on the left On the Authentication Providers tab, find API Keys, and click Edit Turn on the provider and Save If Realm has Drafts enabled, a blue bar will appear at the top where you need to confirm your changes. Confirm and deploy the change. You can now create an API key by pressing the “Create API Key” button and giving it a name. Be sure to copy this down for our application later as it cannot be retrieved again for security reasons Also, in the top left of the Realm UI there is a button to copy the Realm App ID. We will need this ID and API key when we write our application shortly. Lastly, we can enable Sync. The Sync configuration relies on a Schema of the data being written. This allows the objects (i.e. C# or Node.JS objects) from our application we are writing in the next step to be translated to MongoDB Documents. You can learn more about schemas here . We also need to identify a partition key. Partition keys are used to decide what subset of data should reside on each Edge node or each mobile device. For Wavelength deployments, this is typically a variation on the region name. A good partition key could be a unique one per API key or the name of the Wavelength Region (e.g. “BOS” or “DFW”). For this latter example, it would mean that your Far Cloud retains data for all zones, but the Wavelength zone in Boston will only have data tagged with “BOS” in the _pk field. The two ways to define a schema are either to write the JSON by hand or automatic generation. For the former, we would go to the Sync configuration, edit the Configuration tab, choose the cluster we deployed earlier, define a partition key (such as _pk as a string), then define the rules of what that user is allowed to read and write. Then you must write the schema on the Schema section of the Realm UI. However, it is often easier to let Realm auto-detect and write the schema for you. This can be done by putting the Sync into “Development Mode.” While you still choose the cluster and partition key, you only need to specify what database you want to sync all of your data to. After that, your application written below is where you can define classes, and upon connection to Realm Sync, the Sync Engine will translate the class you defined in your application into the underlying JSON representing that schema automatically. 3. Writing an application using Realm Sync: MQTT broker for a Smart Factory Now that the back-end data storage is configured, it is time to write the application. As a reminder, we will be writing an MQTT broker for a smart factory. IOT devices will write MQTT messages to this broker over 5G and our application will take that packet of information and insert it into the Realm database. After that, because we completed the sync configuration above, our Edge-to-Far-Cloud synchronization will be automatic. It also works bidirectionally. The reference application mentioned above is available in this GitHub repository . It is based on creating a C# Console application with the documentation here . The code is relatively straightforward: Create a new C# Console Application in Visual Studio Like any other C# Console Application, have it take in as CLI arguments the Realm App ID and API Key. These should be passed in via a Docker environment variable later and the values of these were the values you recorded in the previous Sync setup step Define the RealmObject which is the data model to write to Realm Process incoming MQTT messages and write them to Realm The data model for Realm Objects can be as complex as makes sense for your application. To prove this all works, we will keep a basic model: public class IOTDataPoint : RealmObject { [PrimaryKey] [MapTo("_id")] public ObjectId Id { get; set; } = ObjectId.GenerateNewId(); [MapTo("_pk")] public string Partition { get; set; } [MapTo("device")] public string DeviceName { get; set; } [MapTo("reading")] public int Reading { get; set; } } To sync an object, it must inherit from the RealmObject class. After that, just define getters and setters for each data point you want to sync. The C# implementation of this will vary depending on what MQTT Library you choose. Here we have used MQTTNet so we simply create a new broker with MqttFactory().CreateMqttServer() then start this with specific MqttServerOptionsBuilder where we need to define anything unique to your setup such as port, encryption, and other basic Broker information. However, we need to hook the incoming messages with .WithApplicationMessageInterceptor() so that way any time a new MQTT packet comes into the Broker, we send it to a method to write it to Realm. The actual Realm code is also simple: Create an App with App.Create() and it takes in the argument of the App ID which we are passing in as a CLI argument Log in with app.LogInAsync(Credentials.ApiKey()) and the API Key argument is again passed in as a CLI argument from what we generated before To insert into the database, all writes for Realm need to be done in a transaction. The syntax is straight forward: instantiate an object based on the RealmObject class we defined previously then do the write with a realm.Write(()=>realm.Add({message)}) Finally, we need to wrap this up in a docker container for easy distribution. Microsoft has a good tutorial on how to run this application inside of a Docker container with auto-generated Dockerfiles. On top of the auto-generated Dockerfile, be sure to pass in the arguments of the Realm App ID and API Key to the application as we defined earlier. Learning the inner workings of writing a Realm application is largely outside the scope of this blog post. However there is an excellent tutorial within MongoDB University if you would like to learn more about the Realm SDK. Now that the application is running, and in Docker, we can deploy it in a Wavelength Edge Zone as we created above. Bringing Realm and Wavelength together In order to access the application server in the Wavelength Zone, we must go through the bastion host we created earlier. Once we’ve gone through that jump box to get to the EC2 instance in the Wavelength Zone, we can install any prerequisites (such as Docker), and start the Docker container running the Realm Edge Database and MQTT application. Any new inbound messages received to this MQTT broker will be first written to the Edge and seamlessly synced to Atlas in the Far Cloud. There is a sample MQTT random number generator container suitable for testing this environment located in the GitHub repository mentioned earlier. Our smart factory reference application is complete! At this point: Smart devices can write to a 5G Edge with low latency courtesy of AWS Wavelength Zones MQTT Messages written to that Broker in the Wavelength Zone have low latency writes and are available immediately for reads since it is happening at the Edge through MongoDB Realm Those messages are automatically synchronized to the Far Cloud for permanent retention, analysis, or synchronization to other Zones via MongoDB Realm Sync and Atlas What's Next Get started with MongoDB Realm on AWS for free. Create a MongoDB Realm account Deploy a MongoDB backend in the cloud with a few clicks Start building with Realm Deploy AWS Wavelength in your AWS Account

October 14, 2021

Ready to get Started with MongoDB Atlas?

Start Free