MongoDB Blog

Articles, announcements, news, updates and more

Modernize your GraphQL APIs with MongoDB Atlas and AWS AppSync

Modern applications typically need data from a variety of data sources, which are frequently backed by different databases and fronted by a multitude of REST APIs. Consolidating the data into a single coherent API presents a significant challenge for application developers. GraphQL emerged as a leading data query and manipulation language to simplify consolidating various APIs. GraphQL provides a complete and understandable description of the data in your API, giving clients the power to ask for exactly what they need — while making it easier to evolve APIs over time. It complements popular development stacks like MEAN and MERN , aggregating data from multiple origins into a single source that applications can then easily interact with. MongoDB Atlas: A modern developer data platform MongoDB Atlas is a modern developer data platform with a fully managed cloud database at its core. It provides rich features like native time series collections, geospatial data, multi-level indexing, search, isolated workloads, and many more — all built on top of the flexible MongoDB document data model. MongoDB Atlas App Services help developers build apps, integrate services, and connect to their data by reducing operational overhead through features such as hosted Data API and GraphQL API. The Atlas Data API allows developers to easily integrate Atlas data into their cloud apps and services over HTTPS with a flexible, REST-like API layer. The Atlas GraphQL API lets developers access Atlas data from any standard GraphQL client with an API that generates based on your data’s schema. AWS AppSync: Serverless GrapghQL and pub/sub APIs AWS AppSync is an AWS managed service that allows developers to build GraphQL and Pub/Sub APIs. With AWS AppSync, developers can create APIs that access data from one or many sources and enable real-time interactions in their applications. The resulting APIs are serverless, automatically scale to meet the throughput and latency requirements of the most demanding applications, and charge only for requests to the API and by real-time messages delivered. Exposing your MongoDB Data over a scalable GraphQL API with AWS AppSync Together, AWS AppSync and MongoDB Atlas help developers create GraphQL APIs by integrating multiple REST APIs and data sources on AWS. This gives frontend developers a single GraphQL API data source to drive their applications. Compared to REST APIs, developers get flexibility in defining the structure of the data while reducing the payload size by bringing only the attributes that are required. Additionally, developers are able to take advantage of other AWS services such as Amazon Cognito, AWS Amplify, Amazon API Gateway, and AWS Lambda when building modern applications. This allows for a severless end-to-end architecture, which is backed by MongoDB Atlas serverless instances and available in pay-as-you-go mode from the AWS Marketplace . Paths to integration AWS AppSync uses data sources and resolvers to translate GraphQL requests and to retrieve data; for example, users can fetch MongoDB Atlas data using AppSync Direct Lambda Resolvers. Below, we explore two approaches to implementing Lambda Resolvers: using the Atlas Data API or connecting directly via MongoDB drivers . Using the Atlas Data API in a Direct Lambda Resolver With this approach, developers leverage the pre-created Atlas Data API when building a Direct Lambda Resolver. This ready-made API acts as a data source in the resolver, and supports popular authentication mechanisms based on API Keys, JWT, or email-password. This enables seamless integration with Amazon Cognito to manage customer identity and access. The Atlas Data API lets you read and write data in Atlas using standard HTTPS requests and comes with managed networking and connections, replacing your typical app server. Any runtime capable of making HTTPS calls is compatible with the API. Figure 1:   Architecture details of Direct Lambda Resolver with Data API Figure 1 shows how AWS AppSync leverages the AWS Lambda Direct Resolver to connect to the MongoDB Atlas Data API. The Atlas Data API then interacts with your Atlas Cluster to retrieve and store the data. MongoDB driver-based Direct Lambda Resolver With this option, the Lambda Resolver connects to MongoDB Atlas directly via drivers , which are available in multiple programming languages and provide idiomatic access to MongoDB. MongoDB drivers support a rich set of functionality and options , including the MongoDB Query Language, write and read concerns, and more. Figure 2:   Details the architecture of Direct Lambda Resolvers through native MongoDB drivers Figure 2 shows how the AWS AppSync endpoint leverages Lambda Resolvers to connect to MongoDB Atlas. The Lambda function uses a MongoDB driver to make a direct connection to the Atlas cluster, and to retrieve and store data. The table below summarizes the different resolver implementation approaches. Table 1:   Feature comparison of resolver implementations Setup Atlas Cluster Set up a free cluster in MongoDB Atlas. Configure the database for network security and access. Set up the Data API. Secrect Manager Create the AWS Secret Manager to securely store database credentials. Lambda Function Create Lambda functions with the MongoDB Data APIs or MongoDB drivers as shown in this Github tutorial . AWS AppSync setup Set up AWS Appsync to configure the data source and query. Test API Test the AWS AppSync APIs using the AWS Console or Postman . Figure 3:   Test results for the AWS AppSync query Conclusion To learn more, refer to the AppSync Atlas Integration GitHub repository for step-by-step instructions and sample code. This solution can be extended to AWS Amplify for building mobile applications. For further information, please contact partners@mongodb.com .

November 23, 2022
Applied

MongoDB Joins Auth0 to Help Startups Combat Security Risks

We are excited to announce that MongoDB for Startups is collaborating with Auth0 for Startups to provide top security for applications by the most innovative startups. Why should a startup be part of the MongoDB and Auth0 startup programs? Customers, investors, and stakeholders expect many different things from a company, but one common requirement is responsibly managing their data. Companies choose MongoDB because it accelerates application development and makes it easier for developers to work with data. Developers mindful of security, compliance, and privacy when it comes to data use the robust Auth0 platform to create great customer experiences with features like single sign-on and multi-factor authentication. “Auth0 and MongoDB are very complementary in nature. While MongoDB provides a strong, secure data platform to store sensitive workloads, Auth0 provides secure access for anyone with the proper authorization," says Soumyarka Mondal, Co-founder of Sybill.ai. "We are safely using Auth0 as one of the data stores for the encryption piece, as well as using those keys to encrypt all of our users’ confidential information inside MongoDB.” What is the Auth0 for Startups Program? Auth0, powered by Okta, takes a modern approach to identity and enables startups to provide secure access to any application, for any user. Through Auth0 for Startups, we are bringing the convenience, privacy, and security of Auth0 to early-stage ventures, allowing them to focus on growing their business quickly. The Auth0 for Startups program is free for one year and supports: 100,000 monthly active users Five enterprise connections Passwordless authentication Breached password detection 50+ integrations, 60+ SDKs, and 50+ social & IdP connections What is the MongoDB for Startups Program? MongoDB for Startups is focused on enabling the success of high-growth startups from ideation to IPO. The program is designed to give startups access to the best technical database for their rapidly scaling ventures. Apply to our program and program participants will receive: $500 in credits for all MongoDB cloud products (valid for 12 months) A dedicated technical advisor for a two-hour, one-to-one consultation to help you with your data migration and optimization Co-marketing opportunities Access to the MongoDB developer ecosystem and access to our VC partners. Apply to Auth0 For Startups and the MongoDB for Startups Program today.

November 23, 2022
Applied

MongoDB and AWS: Simplifying OSDU Metadata Management

In this decade of the 2020s, the energy sector is experiencing two major changes at the same time: The transition from fossil to renewables, and the digital transformation that changes the way businesses operate through better applications and tools that help streamline and automate processes. To support both of these challenges, the Open Group OSDU Forum has created a new data platform standard for the energy industry that seeks to reduce data silos and enable transformational workflows via an open, standards-based API set and supporting ecosystem. OSDU (Open Subsurface Data Universe) is an industry-defining initiative that provides a unified approach to store and retrieve data in a standardized way in order to allow reductions in infrastructure cost, simplify the integration of separate business areas, and adopt new energy verticals within the same architectural principles. Amazon Web Services (AWS) — as an early supporter of OSDU — provides a premier, cloud-first offering available across more than 87 availability zones and 27 regions. MongoDB — an OSDU member since 2019 — and AWS are collaborating to leverage MongoDB as part of the AWS OSDU platform for added flexibility and to provide a robust multi-region OSDU offering to major customers. Why MongoDB for OSDU? OSDU provides a unique challenge, as its architecture is set to support a varied data set originating from the oil and gas industry, while also being extensible enough to support the expanding requirements of new energy and renewables. It must be able to support single-use on a laptop for beginning practitioners, yet scale to the needs of experts with varying deployment scenarios — from on-premises, in-field, and cloud — and from single tenant on one region to multi-region and multi-tenant applications. Furthermore, OSDU architectural principles separate raw object data from the metadata that describes it, which puts an additional burden on the flexibility needed to manage OSDU metadata, while supporting all the above requirements. Enter MongoDB Since 2008, MongoDB has championed the use of the document model as the data store that supports a flexible JSON-type structure, which can be considered a superset of different existing data types — from tabular, key-value, and text to geo-spatial, graph, and time series. Thus, MongoDB has the flexibility not only to support just the main metadata services in OSDU but also to adapt to the needs of domain-specific services as OSDU evolves. The flexibility of MongoDB allows users to model and query the data in a variety of ways within the same architecture without the need to proliferate disparate databases for each specific data type, which incurs overhead both in terms of deployment, cost and scale, and the ability to query. The schema flexibility inherent in this document model allows developers to adapt and make changes quickly, without the operational burden that comes with schema changes with traditional tabular databases. MongoDB can also scale from the smallest environment to massive, multi-region deployments, with cross-regional data replication support that is available today across more than 90 regions with MongoDB Atlas . With the addition of MongoDB’s cluster-to-cluster sync , MongoDB can easily support hybrid deployments bridging on-premises or edge to the cloud, a requirement that is increasingly important for energy supermajors or for regions where data sovereignty is paramount. Example: LegalTag An example of the benefit of MongoDB’s document model is OSDU’s LegalTag Compliance Service , which governs the legal status of data in the OSDU data ecosystem. It is a collection of JSON properties that governs how the data can be consumed and ingested. With MongoDB, the properties are directly stored, indexed, and made available to be queried — even via full-text search for more advanced use cases. The schema flexibility simplifies integrating additional derived data from ingested data sources, which is utilized for the further enrichment of the LegalTag metadata. Here the JSON document can accommodate more nodes to integrate this data without the need for new tables and data structures that need to be created and managed. AWS OSDU with MongoDB MongoDB and AWS collaborated to provide a MongoDB-based metadata implementation (Figure 1), which is available for all main OSDU services: Partition, Entitlements, Legal, Schema, Storage. The AWS default ODSU Partition service leverages MongoDB due to its simple replication capabilities (auto-deployable via CloudFormation, Terraform, and Kubernetes), which simplify identifying the correct connection information at runtime to the correct OSDU partition in a multi-region and multi-cluster deployment. The OSDU Entitlements service manages authorization and permissions for access to OSDU services and its data-using groups. The most recent OSDU reference implementation for Entitlements leverages a graph model to manage the relationship between groups, members, and owners. Thus, AWS again chose MongoDB with its inherent graph capabilities through the document model to simplify the implementation without the need to integrate a further dedicated database technology into the architecture. Figure 1:   MongoDB metadata service options with AWS OSDU. Other potential benefits for OSDU MongoDB also offers workload isolation , which provides the ability to dedicate instances only for reporting workloads against the operational dataset. This provides the ability to create real-time observability of the system based on the activity on metadata. Triggers and aggregation pipelines allow the creation of an alternate view of activity in real-time, which can easily be visualized via MongoDB Charts (part of Atlas) without the need for a dedicated visualization system. Flexibility and consistency A major use case for both the energy industry and the direction of OSDU is the ability to capture and preprocess data closest to where it originated. For remote locations where direct connections to the cloud are prohibitive, this approach is often the only option — think Arctic or off-shore locations. Additionally, certain countries have data sovereignty laws that require an alternative deployment option outside of the public cloud. A MongoDB-based OSDU implementation can provide a distinct advantage, as MongoDB as a data platform itself supports deployment in the field (e.g., off-shore), on-premises, in private cloud (e.g., Kubernetes, Terraform), public cloud (e.g., AWS) and as a SaaS implementation (e.g., Atlas). Adoption of MongoDB for OSDU provides consistency across different deployment/cloud scenarios, thereby reducing the overhead for managing and operating a disparate set of technologies where multiple scenarios are required. Conclusion OSDU has been created to change the way data is collected and shared across the oil and gas and energy industry. Its intent is to accelerate digital transformation within the industry. The range of use cases and deployment scenarios requires a solution that provides flexibility in the supported datasets, flexibility for the developer to innovate without additional schema and operational burden, as well as flexibility to be deployable in various environments. Through the collaboration of AWS and MongoDB, there is an additional metadata storage option available for OSDU that provides a modern technology stack with the performance and scalability for the most demanding scenario in the energy industry. 1. MongoDB Atlas 2. MongoDB Edge Computing 3. OSDU Data platform on AWS

November 22, 2022
Applied

Manage and Store Data Where You Want with MongoDB

Increasingly, data is stored in a public cloud as companies realize the agility and cost benefits of running on cloud infrastructure. At any given time, however, organizations must know where their data is located, replicated, and stored — as well as how it is collected and processed to constantly ensure personal data privacy. Creating a proper structure for storing your data just where you want it can be complex, especially with the shift towards geographically dispersed data and the need to comply with local and regional privacy and data security requirements. Organizations without a strong handle on where their data is stored potentially risk millions of dollars in regulatory fines for mishandling data, loss of brand credibility, and distrust from customers. Geographically dispersed data and various compliance regulations also impact how organizations design their applications, and many see these challenges as an opportunity to transform how they engage with data. For example, organizations get the benefits of a multi-cloud strategy and avoid vendor lock-in, knowing that they can still run on-premises or on a different cloud provider. However, a flexible data model is needed to keep data within the confines of the country or region where the data originates. MongoDB runs where you want your data to be — on-premises, in the cloud, or as an on-demand, fully managed global cloud database. In this article, we’ll look at ways MongoDB can help you keep your data exactly where you need it. Major considerations for managing data When managing data, organizations must answer questions in several key areas, including: Process: How is your company going to scale security practices and automate compliance for the most prevalent data security and privacy regulatory frameworks? Penalties: Are your business leaders fully aware of the costs associated with not adhering to regulations when storing and managing your data? Scalability: Do you have an application that you anticipate will grow in the future and can scale automatically as demand requires? Infrastructure: Is legacy infrastructure keeping you from being able to easily comply with data regulations? Flexibility: Is your data architecture agile enough to meet regulations quickly as they grow in breadth and complexity? Cost: Are you wasting time and money with manual processes when adhering to regulations and risking hefty fines related to noncompliance? How companies use MongoDB to store data where they want and need it When storing and managing data in different regions and countries, organizations must also understand the rules and regulations that apply. MongoDB is uniquely positioned to support organizations to meet their data goals with intuitive security features and privacy controls, as well as the ability to geographically deploy data clusters and backups in one or several regions. Zones in sharded clusters MongoDB uses sharding to support deployments with very large data sets and high-throughput operations. In sharded clusters, you can create zones of sharded data based on the shard key, which helps improve the locality of data. Network isolation and access Each MongoDB Atlas project is provisioned into its own virtual private cloud (VPC), thereby isolating your data and underlying systems from other MongoDB Atlas users. This approach allows businesses to meet data requirements while staying highly available within each region. Each shard of data will have multiple nodes that automatically and transparently fail over for zero downtime, all within the same region. Multi-cloud clusters MongoDB Atlas is the only globally distributed, multi-cloud database. It lets you deploy a single cluster across AWS, Microsoft Azure, and Google Cloud without the operational complexity of managing data replication and migration across clouds. With the ability to define a geographic location for each document, your teams can also keep relevant data close to end users for regulatory compliance. IP whitelists IP whitelists allow you to specify a specific range of IP addresses against which access will be granted, delivering granular control over data. Queryable encryption Queryable encryption enables encryption of sensitive data from the client side, stored as fully randomized, encrypted data on the database server side. This feature delivers the utmost in security without sacrificing performance and is available on both MongoDB Atlas and Enterprise Advanced. MongoDB Atlas global clusters Atlas global clusters allow organizations with distributed applications to geographically partition a fully managed deployment in a few clicks and control the distribution and placement of their data with sophisticated policies that can be easily generated and changed. Thus, your organization can not only achieve compliance with local data protection regulations more easily but also reduce overhead. Client-Side Field Level Encryption MongoDB’s Client-Side Field Level Encryption (FLE) dramatically reduces the risk of unauthorized access or disclosure of sensitive data. Fields are encrypted before they leave your application, protecting them everywhere — in motion over the network, in database memory, at rest in storage and backups, and in system logs. Segmenting data by location with sharded clusters As your application gets more popular, you may reach a point where your servers will reach their maximum load. Before you reach that point, you must plan for scaling your database to adjust resources to meet demand. Scaling can occur temporarily, with a sudden burst of traffic, or permanently with a constant increase in the popularity of your services. Increased usage of your application brings three main challenges to your database server: The CPU and/or memory becomes overloaded, and the database server either cannot respond to all the request throughput, or do so in a reasonable amount of time. Your database server runs out of storage and thus cannot store all the data. Your network interface is overloaded, so it cannot support all the network traffic received. When your system resource limits are reached, you will want to consider scaling your database. Horizontal scaling refers to bringing on additional nodes to share the load. This process is difficult with relational databases because of the difficulty in spreading out related data across nodes. With non-relational databases, this is made simpler because collections are self-contained and not coupled relationally. This approach allows them to be distributed across nodes more simply, as queries do not have to “join” them together across nodes. Horizontal scaling with MongoDB Atlas is achieved through sharding. With sharded clusters, you can create zones of sharded data based on the shard key . You can associate each zone with one or more shards in the cluster. A shard can be associated with any number of zones. In a balanced cluster, MongoDB migrates chunks covered by a zone only to those shards associated with the zone: If one of the data centers goes down, the data is still available for reads unlike a single data center distribution. If the data center with a minority of the members goes down, the replica set can still serve write operations as well as read operations. However, if the data center with the majority of the members goes down, the replica set becomes read-only. Figure 1 illustrates a sharded cluster that uses geographic zones to manage and satisfy data segmentation requirements. Figure 1:   Sharded cluster Other benefits of MongoDB Atlas MongoDB Atlas also provides organizations with an intuitive UI or administration API to efficiently perform tasks that would otherwise be very difficult. Upgrading your servers or setting up sharding without having to shut down your servers can be a challenge, but MongoDB Atlas removes this layer of difficulty through the features described here. With MongoDB, scaling your databases can be done with a couple of clicks. Meeting your data goals with MongoDB Organizations are uniquely positioned to store and manage data where they want it with MongoDB’s range of features discussed above. With the shift towards geographically dispersed data, organizations must make sure they are aware of – and fully understand – the local and regional rules and requirements that apply for storing and managing data. To learn more about how MongoDB can help you meet your data goals, check out the following resources: MongoDB Atlas security, with built-in security controls for all your data Entrust MongoDB Cloud Services with sensitive application and user data Scalability with MongoDB Atlas

November 22, 2022
Applied

Optimizing Your MongoDB Deployment with Performance Advisor

We are happy to announce additional enhancements to MongoDB’s Performance Advisor, now available in MongoDB Atlas , MongoDB Cloud Manager , and MongoDB Ops Manager . MongoDB’s Performance Advisor automatically analyzes logs for slow-running queries and provides index suggestions to improve query performance. In this latest update, we’ve made some key updates, including: A new ranking algorithm and additional performance statistics (e.g., average documents scanned, average documents returned, and average object size) make it easier to understand the relative importance of each index recommendation. Support for additional query types including regexes, negation operators (e.g., $ne, $nin, $not), $count, $distinct, and $match to ensure we cover with optimized index suggestions. Index recommendations are now more deterministic so they are less impacted by time and provide more consistent query performance benefits. Before diving further into MongoDB’s Performance Advisor, let’s look at tools MongoDB provides out of the box to simplify database monitoring. Background Deploying your MongoDB cluster and getting your database running is a critical first step, but another important aspect of managing your database is ensuring that your database is performant and running efficiently. To make this easier for you, MongoDB offers several out-of-the-box monitoring tools , such as the Query Profiler, Performance Advisor, Real-Time Performance Panel, and Metrics Charts, to name a few. Suppose you notice that your database queries are running slower. The first place you might go is to the metrics charts to look at the “Opcounters” metrics to see whether you have more operations running. You might also look at the “Operation Execution Time” to see if your queries are taking longer to run. The “Query Targeting” metric shows the ratio of the number of documents scanned over the number of documents returned. This datapoint is a great measure of the overall efficiency of a query — the higher the ratio, the less efficient the query. These and other metrics can help you identify performance issues with your overall cluster, which you can then use as context to dive a level deeper and perform more targeted diagnostics of individual slow-running queries . MongoDB’s Performance Advisor takes this functionality a step further by automatically scanning your slowest queries and recommending indexes where appropriate to improve query performance. Getting started with Performance Advisor The Performance Advisor is a unique tool that automatically monitors MongoDB logs for slow-running queries and suggests indexes to improve query performance. Performance Advisor also helps improve both your read and write performance by intelligently recommending indexes to create and/or drop (Figure 1). These suggestions are ranked by the determined impact on your cluster. Performance Advisor is available on M10 and above clusters in MongoDB Atlas as well as in Cloud Manager and Ops Manager. Figure 1:  Performance Advisor can recommend indexes to create or drop. Performance Advisor will suggest which indexes to create, what queries will be affected by the index, and the expected improvements to query performance. All of these user interactions are available in the user interface directly within Performance Advisor, and indexes can be easily created with just a few clicks. Figure 2 shows additional Performance Advisor statistics about the performance improvements this index would provide. The performance statistics that are highlighted for each index recommendation include: Execution Count: The number of queries per hour that would be covered by the recommended index Avg Execution Time: The average execution time of queries that would be covered by the recommended index Avg Query Targeting: The inefficiency of queries that would be covered by the recommended index, measured by the number of documents or index keys scanned in order to return one document In Memory Sort: The number of in-memory sorts performed per hour for queries that would be covered by the recommended index Avg Docs Scanned: The average number of documents that were scanned by slow queries with this query shape Avg Docs Returned: The average number of documents that were returned by slow queries with this query shape Avg Object Size: The average object size of all objects in the impacted collection If you have multiple index recommendations, they are ranked by their relative impact to query performance so that the most beneficial index suggestion is displayed at the top. This means that the most impactful index is displayed at the top and would be the most beneficial to query performance. Figure 2:  Detailed performance statistics. Creating optimal indexes ensures that queries are not scanning more documents than they return. However, creating too many indexes can slow down write performance, as each write operation needs to check each index before writing. Performance Advisor provides suggestions on which indexes to drop based on whether they are unused or redundant (Figure 3). Users also have the option to “hide” indexes as a way to evaluate the impact of dropping an index without actually dropping the index. Figure 3: Performance Advisor shows which indexes are unused or redundant. The Performance Advisor in MongoDB provides a simple and cost-efficient way to ensure you’re getting the best performance out of your MongoDB database. If you’d like to see the Performance Advisor in action, the easiest way to get started is to sign up for MongoDB Atlas , our cloud database service. Performance Advisor is available on MongoDB Atlas on M10 cluster tiers and higher. Learn more from the following resources: Monitor and Improve Slow Queries Monitor Your Database Deployments

November 22, 2022
Applied

Start on Your Journey to Operationalize AI-Enhanced Real-Time Applications with MongoDB and Databricks

MongoDB and Databricks have succeeded in two complementary worlds: For MongoDB , the focus is making the world of data easy for developers building applications. For Databricks, the focus is helping enterprises to unify their data, analytics, and AI by combining a data lake's flexibility with the openness, performance, and governance of a data warehouse. Traditionally, these operational and analytical functions have existed in separate domains built by different teams and serving different audiences. Though some will pretend a data warehouse can unify such disparate data and systems, the reality is this approach leaves you making false trade-offs where your developers, your data scientists, and, ultimately, your applications and customers suffer. Data warehouses are not designed to serve consumer-facing applications at scale and process machine learning in real time. It takes the unique application-serving layer of a MongoDB database, combined with the scale and real-time capabilities of a lakehouse, such as Databricks, to automate and operationalize complex and AI-enhanced applications at scale. We observed that a large and growing population of joint customers has for years enabled the flow of data between our two platforms to run real-time businesses and enable a world of application-driven analytics, using MongoDB Connector for Apache Spark . So we asked ourselves: How could we make that a more seamless and elegant experience for these customers? Today we're announcing that Databricks now features MongoDB as a data source within a Databricks notebook , thereby enabling data practitioners with an easier, more curated experience for connecting Databricks with MongoDB Atlas data. This notebook experience makes it simpler for enterprises to deliver real-time analytics, handle complex data warehouse/BI workloads, and to operationalize AI/ML pipelines using the MongoDB Spark Connector . In turn, developer and data teams can collaborate more closely on building a new generation of app-driven intelligence. MongoDB and Databricks are committed to further improve our integration in the coming months. In this post, we'll explain how Databricks can be used as a real-time processing layer for data on MongoDB Atlas using the Spark Connector, extending MongoDB's built-in data processing capabilities like our aggregation framework . We'll also cover how to use Databricks' MongoDB notebook to make this even easier. In future posts we'll outline how to use MongoDB Atlas and Databricks Delta Lake to build sophisticated AI/ML pipelines. Live application data plus the data lakehouse MongoDB Atlas is a fully-managed developer data platform that powers a wide variety of workloads - supporting everything from simple CRUD operations to sophisticated data processing pipelines for analytics and transformation - all with a common query interface. With MongoDB Atlas you can isolate operational and analytical workloads using dedicated analytical nodes. Analytics nodes are read-only nodes that can be exclusively targeted by your queries Let's look at an example. Assume you have long-running analytical queries that you want to run against your cluster and your operations team does not want these queries competing for resources with your regular operational workload. To address this, you add an analytics node to your cluster and then target it in your connection string using an Atlas replica set tag. You can connect to the analytical nodes to run sophisticated aggregation queries, BI and reporting workloads using the Atlas SQL interface , visualize your data using MongoDB Charts , or run Spark jobs using MongoDB’s Spark Connector. For more complex data science and warehousing analytical queries, many enterprises choose the Databricks Lakehouse Platform . Enterprises can also benefit from enriching MongoDB data with data from other internal or external sources in the Databricks Lakehouse. The Databricks Lakehouse Platform combines the best elements of data lakes and data warehouses to deliver the reliability, strong governance, and performance of data warehouses with the openness, flexibility, and machine learning support of data lakes. This unified approach simplifies your modern data stack by eliminating the data silos that traditionally separate and complicate data engineering, analytics, BI, data science, and machine learning. With Databricks notebooks, developers and analytics teams can collaboratively write code in Python, R, Scala, and SQL, plus explore data with interactive visualizations and discover new insights. You can confidently and securely share code with co-authoring, commenting, automatic versioning, Git integrations, and role-based access controls. As good as MongoDB and Databricks are on their own, together we offer enterprises the unmatched ability to work with live application data across traditionally separate domains. This ability allows your teams to deliver what we call application-driven analytics . How does this work? Using MongoDB and Databricks together MongoDB and Databricks offer several ways to integrate the two systems, but the primary means is MongoDB’s Spark Connector. The Spark connector can be used within Databricks notebooks to directly query live application data managed in MongoDB collections and then loaded into data frames for further processing. You can also transform and/or enrich this data with data ingested from other sources using SparkSQL. Queries issued by the Spark Connector can be pushed down to MongoDB's aggregation framework and indexes for pre-processing, significantly improving query efficiency (measured in milliseconds not seconds or minutes). Result sets generated from the Databricks notebooks can then be inserted back into MongoDB collections or can be pushed into Delta Lake for long-running analytics and machine learning. Easier integration using Databricks' MongoDB Notebook A Databricks notebook is a web-based interface that contains runnable code, visualizations, and explanatory text in the form of paragraphs. It lets personas, such as data scientists and data engineers, build linked sets of code in different languages and visualize results in a format in which they are used to working. Notebooks are great for collaboration and can be easily iterated on and improved. MongoDB and Databricks created an example notebook that has sample code for: Reading the data from MongoDB Atlas collections as is into Spark dataframes. Pre-processing and filtering the data from Atlas collections using the aggregation framework, before passing into Spark dataframes. Enriching/transforming the data using SparkSQL Writing the enriched data back to the MongoDB Atlas collection. Figure 1:   Screenshot of data sources in a Databricks notebook. This notebook can help as an initial template for developers to start building complex transformation jobs on MongoDB data with Databricks platform. Interested in a practical example of how this works? Let's demonstrate how you can run analytics on a sample sales dataset using MongoDB's aggregation framework and visualize it with Charts. The example also explains how you can enrich this data using our Databricks notebook and load that back to MongoDB. Refer to the GitHub repo for the same. Figure 2:   Ways to integrate MongoDB and the Databricks Lakehouse Platform. In addition to Spark, MongoDB and Databricks provide seamless integration through shared Cloud Object stores to enable a more traditional data exchange using analytics-optimized formats such as Parquet, as well as event streaming integration using Apache Kafka. Together, MongoDB and Databricks offer unparalleled abilities to unify and process data from disparate systems in real-time. And now with the newly announced Databricks notebooks integration, data engineers and data scientists have an even easier and more intuitive interface to harness MongoDB data for their most sophisticated analytics and AI processing, making real-time applications more intelligent. Conclusion MongoDB Atlas along with Databricks Platform together will help organizations handle the increasing convergence between operational and analytical workloads. This convergence enables application-driven analytics and will help you build smarter applications and derive the right insights in real-time. Reach out to partners@mongodb.com to learn more.

November 21, 2022
Applied

5 Key Questions for App-Driven Analytics

Note: This article originally appeared in The New Stack . Data that powers applications and data that powers analytics typically live in separate domains in the data estate. This separation is mainly due to the fact that they serve different strategic purposes for an organization. Applications are used for engaging with customers while analytics are for insight. The two classes of workloads have different requirements—such as read and write access patterns, concurrency, and latency—therefore, organizations typically deploy purpose-built databases and duplicate data between them to satisfy the unique requirements of each use case. As distinct as these systems are, they're also highly interdependent in today's digital economy. Application data is fed into analytics platforms where it's combined and enriched with other operational and historical data, supplemented with business intelligence (BI), machine learning (ML) and predictive analytics, and sometimes fed back to applications to deliver richer experiences. Picture, for example, an ecommerce system that segments users by demographic data and past purchases and then serves relevant recommendations when they next visit the website. The process of moving data between the two types of systems is here to stay. But, today, that’s not enough. The current digital economy, with its seamless user experiences that customers have come to expect, requires that applications also become smarter, autonomously taking intelligent actions in real time on our behalf. Along with smarter apps, businesses want insights faster so they know what is happening “in the moment.” To meet these demands, we can no longer rely only on copying data out of our operational systems into centralized analytics stores. Moving data takes time and creates too much separation between application events and analytical actions. Instead, analytics processing must be “shifted left” to the source of the data—to the applications themselves. We call this shift application-driven analytics . And it’s a shift that both developers and analytics teams need to be ready to embrace. Find out why the MongoDB Atlas developer data platform was recently named a Leader in Forrester Wave: Translytical Data Platforms, Q4 2022 Defining required capabilities Embracing the shift is one thing; having the capabilities to implement it is another. In this article, we break down the capabilities required to implement application-driven analytics into the following five critical questions for developers: How do developers access the tools they need to build sophisticated analytics queries directly into their application code? How do developers make sense of voluminous streams of time series data? How do developers create intelligent applications that automatically react to events in real time? How do developers combine live application data in hot database storage with aged data in cooler cloud storage to make predictions? How can developers bring analytics into applications without compromising performance? To take a deeper dive into app-driven analytics—including specific requirements for developers compared with data analysts and real-world success stories—download our white paper: Application-Driven Analytics . 1. How do developers access the tools they need to build sophisticated analytics queries directly into their application code? To unlock the latent power of application data that exists across the data estate, developers rely on the ability to perform CRUD operations, sophisticated aggregations, and data transformations. The primary tool for delivering on these capabilities is an API that allows them to query data any way they need, from simple lookups to building more sophisticated data processing pipelines. Developers need that API implemented as an extension of their preferred programming language to remain "in the zone" as they work through problems in a flow state. Alongside a powerful API, developers need a versatile query engine and indexing that returns results in the most efficient way possible. Without indexing, the database engine needs to go through each record to find a match. With indexing, the database can find relevant results faster and with less overhead. Once developers start interacting with the database systematically, they need tools that can give them visibility into query performance so they can tune and optimize. Powerful tools like MongoDB Compass let users monitor real-time server and database metrics as well as visualize performance issues . Additionally, column-oriented representation of data can be used to power in-app visualizations and analytics on top of transactional data. Other MongoDB Atlas tools can be used to make performance recommendations , such as index and schema suggestions to further streamline database queries. 2. How do you make sense of voluminous streams of time series data? Time series data is typical in many modern applications. Internet of Things (IoT) sensor data, financial trades, clickstreams, and logs enable businesses to surface valuable insights. To help, MongoDB developed the highly optimized time series collection type and clustered indexes. Built on a highly compressible columnar storage format, time series collections can reduce storage and I/O overhead by as much as 70%. Developers need the ability to query and analyze this data across rolling time windows while filling any gaps in incoming data. They also need a way to visualize this data in real time to understand complex trends. Another key requirement is a mechanism that automates the management of the time series data lifecycle. As data ages, it should be moved out of hot storage to avoid congestion on live systems; however, there is still value in that data, especially in aggregated form to provide historical analysis. So, organizations need a systematic way of tiering that data into low-cost object storage in order to maintain their ability to access and query that data for the insights it can surface. 3. How do you create intelligent applications that automatically react to events in real time? Modern applications must be able to continuously analyze data in real time as they react to live events. Dynamic pricing in a ride-hailing service, recalculating delivery times in a logistics app due to changing traffic conditions, triggering a service call when a factory machine component starts to fail, or initiating a trade when stock markets move—these are just a few examples of in-app analytics that require continuous, real-time data analysis. MongoDB Atlas has a host of capabilities to support these requirements. With change streams , for example, all database changes are published to an API, notifying subscribing applications when an event matches predefined criteria. Atlas triggers and functions can then automatically execute application code in response to the event, allowing you to build reactive, real-time, in-app analytics. 4. How do you combine live application data in hot database storage with aged data in cooler cloud storage to make predictions? Data is increasingly distributed across different applications, microservices , and even cloud providers. Some of that data consists of newly ingested time-series measurements or orders made in your ecommerce store and resides in hot database storage. Other data sets consist of older data that might be archived in lower cost, object cloud storage. Organizations must be able to query, blend, and analyze fresh data coming in from microservices and IoT devices along with cooler data, APIs, and third-party data sources that reside in object stores in ways not possible with regular databases. The ability to bring all key data assets together is critical for understanding trends and making predictions, whether that's handled by a human or as part of a machine learning process. 5. How can you bring analytics into your applications without compromising their performance? Live, customer-facing applications need to serve many concurrent users while ensuring low, predictable latency and do it consistently at scale. Any slowdown degrades customer experience and drives customers toward competitors. In one frequently cited study, Amazon found that just 100 milliseconds of extra load time cost them 1% in sales . So, it's critical that analytics queries on live data don’t affect app performance. A distributed architecture can help you enforce isolation between the transactional and analytical sides of an application within a single database cluster . You can also use sophisticated replication techniques to move data to systems that are totally isolated but look like a single system to the app. Next steps to app-driven analytics As application-driven analytics becomes pervasive, the MongoDB Atlas developer data platform unifies the core data services needed to make smarter apps and improved business visibility a reality. Atlas does this by seamlessly bridging the traditional divide between transactional and analytical workloads in an elegant and integrated data architecture. With MongoDB Atlas, you get a single platform managing a common data set for both developers and analysts. With its flexible document data model and unified query interface, the Atlas platform minimizes data movement and duplication and eliminates data silos and architectural complexity while unlocking analytics faster and at lower cost on live operational data. It does all this while meeting the most demanding requirements for resilience, scale, and data privacy. For more information about how to implement app-driven analytics and how the MongoDB developer data platform gives you the tools needed to succeed, download our white paper, Application-Driven Analytics .

November 21, 2022
Applied

MongoDB World 2022 Recap — Performance Gotchas of Replicas Spanning Multiple Data Centers

Indeed has more than 25 million open jobs online at any one time. It stores more than 225 million resumes on Indeed systems, and it has 250 million unique users every month. Indeed operates enterprise-wide global clusters in the cloud across multiple availability zones all around the world, including the United States, Asia-Pacific, Europe, and Australia. Indeed is also a MongoDB super user. About 50% of everything Indeed does is built on MongoDB. In a recent session at MongoDB World 2022, Indeed senior cloud database engineer Alex Leong shared real-world experiences of performance issues when spanning replica sets across multiple data centers. He also covered how to identify these issues and, most importantly, how to fix them. This article provides highlights from Leong’s presentation, including dealing with changes in sync sources, replication lags, and more. Resilience and performance Indeed maintains multiple data centers for resiliency. Having multiple data centers ensures there's no single point of failure and keeps data in close proximity to job seekers' locations. This approach facilitates faster response times and better overall end user experience. Running multiple data centers can introduce other performance issues, however. One issue involves the initial sync of new nodes in the system, which needs to happen as quickly as possible to avoid returning stale data. Write concern is a critical consideration because, if there's an interruption on a primary node and a failover to a secondary, when you eventually roll back to the primary, any changes that were captured on the secondary while the system was running in failover mode must be preserved. Also, when you're running multiple data centers, changes in sync sources can occur that go unnoticed. Replication lags can occur when data centers are located far apart from each other. Overriding sync sources When you have an environment with hundreds of millions of users and enormous volumes of data spanning several geographic regions, spinning up and synchronizing a new node in a replica set creates logistical hurdles. To start, you have to decide where the new node syncs from. It seems logical that the default decision would be to sync with the nearest node. But, as Leong said in his session, at times you may not get the nearest sync source, and you may have to override the default sync source to choose the best one. This decision needs to be made early, Leong said, because doing so later means any progress you've made toward syncing the new node will have been wasted. Replication lags Replication lags can occur between the primary and secondary nodes for several reasons, including downtime (planned or unplanned) on the primary server, a network failure, or disk failure. Whatever the reason, there are ways to speed things up. In his session, Leong illustrates how to use the WiredTiger cache size to accelerate replication between nodes. Changes in sync sources Leong uses the term sync topology to describe how primary and secondary nodes are configured for syncing data between them. In some scenarios, a secondary node can change its sync source (sync topology) from one node to another, perhaps because the first node was busy at the time. MongoDB makes this change automatically, and it might not be noticed without looking at the log. Fixing cross-data center write concerns According to Leong, when write performance decreases, 99% of the time it's because of a change in sync sources. To be proactive, Leong creates a write performance monitor to identify and self-heal decreases in write performance so he doesn't have to find out the hard way (from users). Other critical performance issues covered in the session include chained replication , which is the process by which secondary nodes replicate from node to node, changing write concern when a secondary node goes down, and how to configure write concerns across Availability Zones in AWS. For more details, watch the complete session from MongoDB World 2022: Performance Gotchas of Replicas Spanning Multi Datacenters .

November 17, 2022
Applied

Enhancing the .NET Development Experience with Roslyn Static Analysis

The MongoDB .NET/C# driver introduces idiomatic APIs for constructing queries and aggregations: LINQ and Builders . These APIs eliminate the need to write native MongoDB Query Language (MQL), but they also introduce some overhead when it comes to troubleshooting and optimizing the underlying MQL. Because the generated MQL cannot be inspected at compile time, troubleshooting queries involves outputting MQL at runtime and/or inspecting runtime exceptions. Given that MQL generation from a C# expression is basically transpiling, we knew that theoretically inferring the general form of MQL in compile time was solvable by static analysis. This realization, and the fact that the .NET ecosystem has an amazing framework for writing static analyzers ( Roslyn ), made me excited to try out this idea during MongoDB Skunkworks week . In this article, I will share my experience of forming a plan for this project, crafting a quick proof-of-concept during Skunkworks week, and eventually releasing the first public version . Skunkworks at MongoDB One of my favorite perks of working at MongoDB is that we get a whole week, twice a year, to focus on our own projects. This week is a great opportunity to meet and collaborate with other folks in the company, try out any ideas we want, or learn something new. I started my Skunkworks week by refreshing my Roslyn skills. While a week sounds like a fair amount of time for rapid prototyping, naturally I still had to settle on just a small subset of all the cool features that came to mind. I was lucky and, by the end of the Skunkworks, I had a MongoDB Analyzer for .NET prototype sufficient to demonstrate the feasibility of this idea. Roslyn analyzers A significant part of the .NET ecosystem is the open source .NET Compiler Platform SDK (Roslyn API). This SDK is well integrated into the .NET build pipeline and IDE (e.g., VS, Rider), which allows for the creation of tools for code analysis and generation. The Roslyn SDK exposes the standard compiler's building blocks. The main ones that will be used in the Analyzer project are: Abstract syntax tree (AST): Data structure representing the text of the analyzed code. Symbol table: Data structure that holds information about variables, methods, classes, interfaces, types, and other language elements. Each node in AST can have a corresponding symbol. Emit API: API that allows you to generate a new IL code dynamically and compile it to a memory assembly, which can be loaded and executed in the same application. Roslyn SDK provides a convenient API to develop and package a code analyzer, which can be easily integrated into a .NET project and executed as part of the build pipeline. Or, it can expose an interactive UI in an IDE, thereby enriching developers' experience and enforcing project-specific rules. Design approach The .NET.C# driver provides an API to render any LINQ or Builder expression to MQL. The next logical step is to identify the needed expressions and use the driver to extract the matching MQLs. Extracting the Builders or LINQ expression syntax nodes from the syntax tree provided by Roslyn was fairly straightforward. The next step, therefore, is to create a new syntax tree and add these expression syntax nodes combined with MQL generating syntax. Then, this new syntax tree is compiled into executable code, which is dynamically invoked to generate the MQL. To optimize this process, the Analyzer maintains a template syntax tree containing a sample MQL generation code from an expression: public class MQLGenerator { public static string RenderMQL() { var buildersDefinition = Builders<MqlGeneratorTemplateType>.Filter.Gt(p => p.Field, 10); return Renderer.Render(buildersDefinition); } } From this template, a new single syntax tree is produced for each Analyzer run, by dynamically adding the RenderMQL_N method for each analyzed expression N, and replacing the expression placeholder with the analyzed expression: public static string RenderMQL_1() { var buildersDefinition = AnalyzedBuildersExpression; return Renderer.Render(buildersDefinition); } Next, the compilation unit is created from the syntax tree containing all the analyzed expressions and emitted to in-memory assembly (Figure 1). This assembly is loaded into Analyzer AppDomain, from which the MQLGenerator object is instantiated, which provides the actual MQL by invoking RenderMQL_N methods. Figure 1: &nbsp; LINQ and Builder expressions extraction and MQL generation. This approach imposed four fundamental challenges, discussed below: Data types resolution: Expressions are strongly typed, while the types are usually custom types that are defined in the user code. Variables resolution: Expressions usually involve variables, constants, and external methods. The Analyzer cannot resolve those dependencies at compile time. Driver versions: Different driver versions might render different MQL. The exact driver version referenced by the analyzed code has to be used. Testing: The Roslyn out-of-the-box testing template lets you test analyzers on C# code provided as a simple string, which imposes significant maintainability challenges for a large number of tests. Data types resolution Given a simple LINQ expression that retrieves all the movies produced by Christopher Nolan from the movies collection: var moviesCollection = db.GetCollection<Movie>("movies").AsQueryable(); var movies = moviesCollection.Where(movie => movie.Producer == “Christopher Nolan”); The underlying Movie type, and all types Movie is dependent upon, must be ported into the Analyzer compilation space. All imported types must exactly reproduce the original namespaces hierarchy. Expressions like db.GetCollection<Movie> must be rewritten with fully qualified names to avoid naming collisions and namespace resolutions. For example, user code could contain Namspace1.Movie and Namespace2.Movie . An additional problem with importing the types directly is the unbounded complexity of methods and properties implementations, which in most cases could not be compiled in the Analyzer compilation space. This excess code plays no role in MQL generation and must not be imported into the compilation unit. We decided that an easier and cleaner solution was to create a unique type name for each referenced type under a single namespace. The Analyzer uses the semantic model to inspect the Movie type defined in the user’s code and creates a new MovieNew syntax node mirroring all Movie properties and fields. This process is repeated for each type referenced by Movie , including enums, arrays, collections (Figure 2). After creating a MovieNew type as a syntax declaration, the original LINQ expression must be rewritten to reference the new type. Therefore, the original expression is transformed to a new expression: db.GetCollection<MovieNew>("movies") . Figure 2: &nbsp;LINQ and Builder expressions extraction, data types resolution and MQL generation. Variables resolution In practice, LINQ and Builders expressions mostly reference variables as opposed to simple constants. For example: var movies = moviesCollection.Where(movie => movie.Title == movieName) At runtime, the movieName value is resolved, and MQL is generated with a constant value. For example, the above expression can result in the following MQL: aggregate([{ "$match" : { "Title" : "Dunkirk" } }]) This constant value is not available to Analyzer at compile time; therefore, we have to think of a workaround. Instead of presenting the constant, the Analyzer outputs the variable name: aggregate([{ "$match" : { "Title" : movieName } }]) As you can see, this technique does not produce a valid MQL. But, most importantly, it preserves the MQL shape and contains the referenced variable information. This is done by replacing each external variable and method reference in the original expression by a unique constant, and substituting it back in the resulting MQL (Figure 3). Figure 3: &nbsp; LINQ and Builder expressions extraction, constants remapping, data types resolution and MQL generation. Driver versions The naive approach would be to embed a fixed driver dependency into the Analyzer. However, this approach imposes some significant limitations, including: MQL accuracy degradation: Different versions of the driver can produce slightly different MQL due to bug fixes and/or new features. Backward compatibility: Expressions written with older driver versions might not be supported or result in different MQL. Forward compatibility: The Analyzer would not be able to process new expressions supported by newer driver versions. This issue can be resolved by releasing a new Analyzer version for each driver version, but ideally we wanted to avoid such development overhead. Luckily, instead of embedding a driver package with a fixed version into the Analyzer package, and limiting the Analyzer only to that specific driver version, Analyzer uses the actual driver package that is used by the user’s project and found on the user's machine. In this way, Analyzer is “driver-version agnostic” in some sense. One of the challenges was to dynamically resolve the correct driver version for each compilation, as C# dynamic compilation tries to resolve the dependencies from the current AppDomain. To solve this, Analyzer overrides the global AppDomain assembly resolution and loads the correct driver assemblies for each resolution request. An additional nuance was to load the correct .NET framework version. Usually, the Analyzer runs on a different .NET platform than the project's .NET target (e.g., Analyzer can run in VS on .NET Framework 4.7.2, while the analyzed project references the .NET Standard 2.1 driver). Luckily, all recent driver distributions contain the .NET Standard 2.0 version, which is supported by both .NET Core and .NET Framework platforms. The next step is to identify the physical location of .NET Standard 2.0 driver assemblies with the correct version (Figure 4). This approach allows the Analyzer to be driver-version agnostic, including supporting future driver versions regardless of the OS platform (e.g., Rider on Linux/Mac, VS on Mac/Windows, .NET build Linux/Mac/Windows). Figure 4: &nbsp; LINQ and Builder expressions extraction, constants remapping, data types resolution, driver version resolution and MQL generation. Testing Writing tests for such a project requires an unorthodox testing methodology as well. However, the Roslyn SDK provides a testing framework for writing integration tests. An integration test would receive a C# code snippet to be analyzed supplied as string and then execute the Analyzer on it. The default testing methodology introduces some inconveniences. For example, writing and maintaining hundreds of tests cases, with each test case testing multi-line C# code, involving complex data types as a usual string, without a compiler involves quite the overhead. Therefore, we extended the testing framework by creating a custom test runner in the following way. All the C# code for the integration tests is written as a standalone C# project, which is compiled in a standard way. Common underlying data types and other code elements are easily reused. An intended test method is marked by a custom attribute denoting the expected result. An additional test project references the former project and uses the reflection to identify the test cases denoted by special attributes. Then, it executes the Analyzer on the test cases’ C# files and the appropriate driver version and validates the results. For example, for LINQ expression .Where(u => u.Name.Trim() == "123") , we expect the Analyzer to produce a warning for LINQ2 and valid MQL for LINQ3. The test case is written in the following way: [NotSupportedLinq2("Supported in LINQ3 only: db.coll.Aggregate([{ \"$match\" : { \"Name\" : /^\\s*(?!\\s)123(?<!\\s)\\s*$/s } }])")] [MQLLinq3("db.coll.Aggregate([{ \"$match\" : { \"Name\" : /^\\s*(?!\\s)123(?<!\\s)\\s*$/s } }])")] public void String_methods_Trim() { _ = GetMongoQueryable() .Where(u => u.Name.Trim() == "123"); } The Analyzer testing framework parses the C# test cases project and creates a test case for each (DriverVersion, LinqProviderVersion, TestCase) combination (as shown in Figure 5): Figure 5: &nbsp; Test cases dynamically generated from C# code for each tested driver version discovered in Visual studio test explorer. This approach allows smooth integration with VS test runner and a seamless development experience. Besides significantly increasing the maintainability and readability, this approach also introduces a bonus feature. The test code project can be opened as a standalone solution (without the test framework), and the Analyzer output can be visually inspected for each test case as a user would see it. From initial idea to first release Because the Skunkworks project proved to be successful, the decision was made to develop a public first release. Generally, developing and releasing a greenfield product in most companies is a lengthy process, which involves resource allocation and planning, productizing, marketing, quality assurance, developing appropriate documentation, and support. In MongoDB, however, this process was incredibly fast. We formed a remote ad hoc team, across two continents, involving product management, documentation experts, developer relations, marketing specialists, and developers. Despite the fact that we were working together as a team for the first time, the collaboration level was amazing, and the high level of professionalism and motivation allowed everybody to do their part extremely efficiently with almost zero overhead. As a result, we developed and released a full working product, documentation, marketing materials, and support environment in less than three months. Learn more about our internal Skunkworks hackathon and some of the projects MongoDB engineers built this year.

November 17, 2022
Engineering Blog

Built by MongoDB: Qubitro Makes Device Data Accessible Anywhere it's Needed

Increased cloud adoption and the expansion of 5G networks are expected to drive growth in IoT technologies over the next few years. Emergent IoT technologies are poised to transform businesses and the social fabric, including healthcare, smart homes and cities, and the government sector. Delaware-based startup, Qubitro , looks to capitalize on the potentially explosive growth in IoT technology by helping companies bring smart solutions to market faster. Qubitro, which is also a member of the MongoDB for Startups program, offers the fastest way of collecting and processing device data to activate it wherever it's needed. Product vision Qubitro founder and CEO, Beray Bentesen, estimates that there are now billions of devices producing massive amounts of data. The company's mission, he says, is to make device data accessible anywhere it's needed as fast as possible and at a lower cost than ever before. By collecting device data from multiple networks and providing various developer toolkits for activating data in applications, Qubitro enables data-driven decision making and modern application development. The company has two main products: the Qubitro Portal , a user interface where users can collaborate with other members or their internal team and create real-time actions such as rules and output integrations with their applications, and developer tools including APIs and SDKs that allow for custom solutions without having to develop data infrastructure from scratch. Bentesen wants Qubitro to become the fabric of a digital transformation powered by device data. "We aim to make any data published from devices flow over our network and make any application that relies on device data to integrate with our services," Bentesen says. The ideal Qubitro customer is one that needs to put device data into their solutions. "It could be startups, IoT-adopting enterprises, or custom solution providers," Bentesen says. The company has also been heavily investing in developer experience, he adds. A platform to build upon The secret to building a platform that can process data in milliseconds with privacy and user experience combined is, not surprisingly, another platform — specifically the MongoDB Atlas developer data platform. "We offer managed connectivity solutions, user interface, and the APIs," Bentesen says. "So we process tons of data. And MongoDB is in the middle of all those inputs and outputs." The MongoDB for Startups program helps startups build faster and scale further with free MongoDB Atlas credits, one-on-one technical advice, co-marketing opportunities, and access to a vast partner network. Bentesen says the company has benefitted from being in the program a number of ways. "In the early days when we joined the program, we were able to get answers to questions that would take probably weeks or maybe more if you search on the internet," he says. "We were able to understand what to develop, which saved us a lot of time and of course expense." The MongoDB Atlas platform also helps their developers during those crucial stages prior to launching a new feature and as the product grows in popularity. "With MongoDB Atlas, we could test our development environment before going to production," Bentesen says. "And as we scale, we're able to observe the traffic through MongoDB Atlas and optimize thanks to the tools MongoDB offers, like MongoDB Compass , without dealing with code or complex environments." MongoDB's document model database made it an easy choice for the company's needs. "We decided to use MongoDB because it's a flexible environment," Bentesen says. "We knew we would have to build new features over time. So we needed to go with a flexible database. We're still adding more and more features without breaking the entire system. We wanted that flexibility, and in a managed cloud offering, which MongoDB gives us." Bentesen also cites MongoDB's Time Series collections as one of the features he's most excited about, since the vast majority of IoT solutions rely on time series data. Looking forward Bentesen says Qubitro will likely add more enterprise features in the future. The more they grow, he says, the more insight they're getting about what customers want. The company also plans to invest heavily in growing its community of users and, of course, attracting more talent. Bentensen says the company fully embraces the remote-first culture and believes they can work faster working remotely. If you're looking forward to building the next generation of connected solutions, visit Qubitro.com , join the company's Discord server , or have a chat anytime, even weekends! Are you part of a startup and interested in joining the MongoDB for Startups program? Apply now . For more startups content, check out our previous blog on ChargeHub .

November 9, 2022
Applied

Hybrid Cloud: Flexible Architecture for the Future of Financial Services

Financial services companies are reimagining how they apply technology to meet the growing service demands of a digital-first world. As they recognize the operational and competitive advantages of the public cloud, many companies are migrating their computing needs to it as quickly as possible. For an industry with tight regulations, a vast amount of private data, and complex legacy infrastructure, however, moving every workload to the cloud isn’t feasible just yet. Instead, some companies are moving to hybrid cloud, an architecture that enables them to use the public cloud wherever possible, while keeping those applications and data with tricky legal or reputational exposure on in-house systems. In this article, we’ll examine advantages of a hybrid cloud approach and outline steps to consider when preparing for such a shift. Overview Hybrid cloud integrates public cloud and on-premises infrastructure into a single functioning unit. Through the public cloud, institutions gain valuable versatility, agility, and scale to run applications more efficiently and to turbo-charge experimentation. They can use existing infrastructure to handle sensitive workloads — including those storing Personally Identifiable Information (PII) — within a familiar, time-tested environment. Deciding where to host applications is usually a function of a workload’s data secrecy and sovereignty requirements and an institution’s assessment of risks and opportunities related to them. Developing the technical flexibility to move between public and private infrastructures makes it easier to match those requirements to the environment best suited to fulfill them. Advantages to hybrid cloud A hybrid cloud approach offers many advantages. For example, institutions can use public cloud infrastructure for tasks with dynamic resource requirements, such as payments processing over holidays or risk calculations for end-of-month reporting. This setup can reduce the delays, data center overhead, and sunk costs associated with adding in-house servers, some of which may ultimately be used situationally or not at all. Companies also save on capital expenses and improve responsiveness to internal and external demands. A hybrid cloud setup can also help organizations address compliance, resilience, and performance needs. Those operating in multiple countries can use in-house and public cloud resources across different regions to satisfy disparate requirements around data sovereignty and residency. This geographic and infrastructure diversity can also enhance a company’s failover and disaster recovery profile. By co-locating applications in public cloud regions near customers, institutions can also improve service performance — an important factor as the industry moves toward mobile-first solutions. As institutions pursue more efficient ways to work, the insight gained through planning and executing a hybrid cloud strategy can help inform and transform an organization’s operations. Institutions can begin the shift to the continuous cadence of DevOps, DevSecOps, and MLOps teams by incorporating public cloud tools and methods. This approach includes using process automation and orchestration tools to streamline delivery and maintenance, and management applications to free up in-house IT resources from undifferentiated work. The following section describes other ways a hybrid approach can encourage changes to institutional conventions. Rethinking budgeting Although fixed infrastructure costs and investments can limit an organization’s flexibility, most companies still budget for fixed costs and may find the usage-based billing of public cloud services unnerving. Making the shift from the transparency and stability of capital expenses for on-premises infrastructure to the unpredictability of operating expenses in public-cloud procurement requires an organizational adjustment. Vendors do offer cost-management tools to help budget for and accommodate these changes. Hybrid cloud can help ease this transition as the organization moves into an infrastructure-as-a-service model. Expanding a security mindset The financial services sector is a high-target industry for cyberattacks. Data loss and leakage are also significant concerns. Organizations, therefore, often struggle to transfer any control over security and system integrity to a third party, and disparate regulations increase those hurdles. Sometimes, though, organizations overestimate the effectiveness of their in-house security teams and underestimate the security capabilities of the largest cloud providers who, like banks, are charged with deflecting the most sophisticated attacks all day, every day. Cloud services providers and other third-party vendors invest heavily in security research and resources. They regularly certify to the highest compliance standards. They’re also constantly developing new solutions to help institutions bridge gaps in their homegrown security measures and team capabilities. The result is that the security capabilities of the public cloud providers are often more advanced than those of in-house teams. Simplifying infrastructure complexity The largest financial institutions with the greatest global coverage face the biggest challenges in building a hybrid architecture. What’s more, sunk investments in on-premises infrastructure can make it cost- and ROI-prohibitive to shift workloads to the public cloud. An architecture that can support a hybrid of public cloud, on-premises cloud, and bare-metal deployments offers a flexible solution to address this complexity. Click to read Finance, Multi-Cloud, and The Elimination of Cloud Concentration Risk Preparing for the shift As with any big shift in technology, the move to hybrid presents a set of challenges that are as much cultural and operational as they are technical. Preparation for a hybrid cloud project, therefore, must include organizational readiness assessments across functions. It must take into consideration not just the technical, business, and monetary impact but also the legacy mindset and organizational rituals that can jeopardize the best-laid technology strategies. In pursuing a hybrid cloud strategy, institutions can begin to modernize outdated operating principles as they transform their approach to technology. Given the high uptake within the industry, the steps to adopt a public cloud-only model are well-documented. In a hybrid cloud approach, however, lack of expertise in integrating public and private cloud technologies is a frequent challenge. This, coupled with staff who may be reluctant to adopt unfamiliar technology, can create resistance among technology teams. Early successes with high management attention can create excitement; for a wider adoption, strong central platform support through the infrastructure team, as well as training and transparency, can help staff get on board. Other effective ways for an organization to prepare for a hybrid cloud future include setting clear business and technical goals, creating inventories of data and applications, and evaluating how customers might react to changes in responsiveness and security brought about by the switch to hybrid. Assessing in-house skills and managing transformation anxiety of existing technical staff are also crucial to team preparedness. The following steps can help financial institutions prepare for a hybrid approach. Know your company and your customer Set goals: Companies that articulate clear goals for their hybrid strategy are more likely to achieve them. These goals might include gains in operating efficiency, more flexible development, cost savings, speed of innovation, IT resiliency, or regulatory flexibility. Evaluate your customer profiles: Retail and institutional customers require different services and protections from their financial institutions. An understanding of these needs and concerns should inform any analysis of the potential for a hybrid cloud implementation. The storage of PII, for example, demands special consideration. Profile your assets: Financial institutions house data and applications that perform business functions. Understanding these in a regulatory context, from a commercial perspective and through a technical lens, will influence decisions about how best to optimize them in a hybrid cloud environment. Blend private and public cloud: To decrease effort, organizations should reduce differences between the two deployment methods. This aim is crucial for a successful adoption. Initiatives that require teams to manually request assets on the private cloud usually fail. Build your team Engage stakeholders: An effective hybrid cloud strategy engages functions across the enterprise. It incorporates business, legal, IT, and security priorities into a comprehensive plan. Engaging compliance officers and security professionals early on is critical, as compliance and system safeguards must be woven into the DNA of any hybrid cloud plan from the outset.. Assess skills and educate the team: At the start of a journey to hybrid cloud, organizations often lack the expertise and mindset to confidently shift to a new model. Simply understanding the myriad services offered by public cloud providers can be daunting. Combining public and private clouds in a hybrid setup requires another whole level of up-skilling. Evaluating in-house teams to determine education and training needs is essential for the new paradigm. Foster transparency: In any effective cloud strategy, transparency across the organization is crucial to gaining buy-in, just as education is crucial to building skills. A cloud adoption team can ensure that the training and cultural needs of the organization are met alongside financial, customer, and business imperatives. Engaging your cloud provider(s) in this process can help. Map your migration Start small: Small proofs of concept build confidence and allow teams to expand incrementally on the back of those successes.. Manage risk: Start with a low-risk approach. For example, organizations may choose to move workloads that are highly dynamic and less sensitive first. These might include some customer-facing apps that contain little PII. Institutions often start with retail applications, while still running their institutional-focused applications within their on-premises data centers. This approach may change over time as they become more comfortable with public cloud security and managing a hybrid environment. Moving to hybrid Financial services institutions are already adapting to greater demands for innovation and efficiency by designing responsive IT environments and taking advantage of the public cloud — and often, hybrid cloud is a crucial part of that pathway. A hybrid cloud strategy is a great solution to help organizations to meet their technical and business objectives more cost-efficiently and effectively than with either a public or private cloud alone. A hybrid cloud approach offers the flexibility that institutions need to meet rapidly changing customer demands as well as competition from a new wave of challengers. As best practices become clear and more implementation lessons emerge, the industry will further embrace hybrid cloud as an important step in an evolution to a fully managed multi-cloud solution. Finding the right partners, of course, is crucial. Experienced teams and the best technical solutions greatly increase the odds of executing a successful hybrid strategy. The team at MongoDB offers solutions and advice to help financial institutions progress toward a more functional, flexible, and future-forward enterprise technology platform. To learn more about how MongoDB can help you on your cloud adoption journey, check out the following resources: Finance, Multi-Cloud, and The Elimination of Cloud Concentration Risk How Financial Services Achieve A Strategic Advantage With Data-Driven Disruption The Road to Smart Banking The 5-Step Guide to Mainframe Modernization for Banks

November 3, 2022
Applied

Migrate to MongoDB Atlas on AWS with Relational Migrator

Competitive advantage is directly tied to how well companies are able to build software around their most important asset: data. Rigid relational schemas often require downtime and significant application code updates in order to make even simple modifications, such as adding a new data attribute. In MongoDB, entities are modeled as documents that map naturally to the same objects that developers are used to working with in their programming languages. Additionally, legacy relational databases were not built to scale horizontally. Sharding data to handle large data volumes and ensure lower latency is typically a significant manual process that requires custom application logic to query across multiple shards and aggregate results. MongoDB Atlas is an effective solution for solving such problems, and MongoDB Relational Migrator streamlines the process of moving to MongoDB from a relational database. MongoDB Atlas on Amazon Web Services (AWS) MongoDB Atlas removes the need for a complex object-relational mapping layer and allows developers to build and release new features quicker. MongoDB Atlas is built to be distributed and to handle sharding transparently to the developer. With Atlas, no application code changes are necessary when an application needs to scale out from a 10MB to a 500TB dataset. MongoDB Atlas is well integrated into the AWS environment, and the document-based database works seamlessly with AWS products. To learn more about common integration and project requirements, refer to Managed MongoDB on AWS . Migrate to Atlas on AWS with Relational Migrator Some customers have successfully migrated their relational workloads to MongoDB Atlas on AWS. One example is Cox Automotive, whose system was hitting limitations on the relational database. The company migrated to Atlas on AWS and leveraged capabilities like Atlas Data Lake (powered by Amazon Simple Storage Service (Amazon S3)) and Atlas App Services . Read our customer case study to learn more about how Cox Automotive uses MongoDB Atlas . At the same time, other companies have struggled with how to approach this challenge. When considering such a migration, it’s important to think carefully about data modeling. Although it’s possible to naively move a relational schema into MongoDB without any changes, this approach won’t deliver many of MongoDB’s benefits. A better practice is to design a new and better MongoDB schema that’s more denormalized and potentially to take the opportunity to revise the architecture of the application as well. To make this process easier, we’re developing MongoDB Relational Migrator . Relational Migrator streamlines the process of moving to MongoDB from a relational database and is compatible with Oracle, Microsoft SQL Server, MySQL, and PostgreSQL. MongoDB Relational Migrator connects to a relational database to analyze its existing schema, then helps architects design and map to a new MongoDB schema. Migration support When you’re ready, Relational Migrator will perform the data migration from the source RDBMS to MongoDB. Migration can be a one-shot migration if you’re prepared for a hard cutover. And soon, we will also support a continuous sync in case you need to leave the source system running and continue pushing changes into MongoDB. With Relational Migrator, you can map your relational schema—or just a piece of it, if needed—to a new MongoDB schema. Relational Migrator helps the design and mapping process with common MongoDB schema design patterns built in. Based on this schema mapping, you can move data into a target MongoDB cluster. Relational Migrator will support both Snapshot (one-time) and continuous data migration. Get more details on Relational Migrator on the product page or in the deep dive presentation from MongoDB World. Get started with MongoDB Atlas in AWS Marketplace today.

November 3, 2022
Applied

Ready to get Started with MongoDB Atlas?

Start Free