MongoDB Developer

Coding with MongoDB - news for developers, tips and deep dives

Accelerate Data Modernization with Infosys Data Model Converter

Are you in the process of migrating applications from a relational database to MongoDB? If so, you’re likely trying to best understand and decide how your enterprise data needs to be modeled. Our previous blog discussed how Infosys Data Services Suite can help enterprises move data seamlessly from legacy relational databases to MongoDB. But moving data is only one part of the puzzle. The more significant step is choosing the target data model, or schema design, a process that usually requires several hours of highly skilled talent. That’s why we created this follow-up blog to help you get started. Rethinking Schema Design Ultimately, schema design can be the difference between an inefficient, disorganized database and a strategic one that empowers the entire company. Schema design in MongoDB requires a change in perspective for data architects, developers, and database administrators. They have to: Rethink the legacy relational data model. This model flattens data into rigid two-dimensional tabular structures of rows and columns. The new data model is a rich and dynamic one with embedded sub-documents and arrays Rethink how the data platform works. In relational databases, it is extremely difficult to change the data platform as the application evolves. However, in MongoDB, the apps and APIs come first and the data platform dynamically accommodates the data Getting Schema Design Right Begin the schema design process by considering the application’s requirements. You’ll want to model the data in a way that leverages the flexibility of the document model. In schema migrations, it may seem easy at first to simply mirror the flat schema of the relational database in the document model. However, this negates the advantages enabled by the rich and embedded data structures of the document model. For example, data that belongs to a parent-child relationship in two RDBMS tables can be collapsed (embedded) into a single document in MongoDB. The application data access patterns should also drive schema design with a specific focus on: The read/write ratio of database operations and whether it is more important to optimize the performance of one operation over another The types of queries and updates performed by the databases The lifecycle of the data and growth rate of documents Simplifying Schema Design with Infosys Data Model Converter Infosys has developed a solution called Infosys Data Model Convertor that processes source relational schema and the above-mentioned signals as inputs and automatically provides target MongoDB schema suggestions. Infosys Data Model Converter is available as part of Infosys Modernization Suite which accelerates enterprises’ modernization journey. Each schema suggestion is accompanied by a detailed analysis report. The data modeler can use this as a starting point and iterate over the schema to arrive at the final MongoDB schema. The Infosys Data Model Converter reduces 50-60% of the effort typically spent in schema design. Key Features Boosts productivity by augmenting the migration of RDBMS to NoSQL database Saves time by automatically extracting schema, query and data patterns from an existing RDBMS Comprehensively analyzes the RDBMS entity relations, data and read-and-write patterns Applies a rich set of rules and generates a fully compliant NoSQL target state data model Offers flexibility by externalizing the rules for organization-specific customizations Connects and deploys the model to the target NoSQL platform with sample data Discover more ways in which Infosys can help you unlock value from modernization. Contact us for any modernization questions.

April 15, 2021
Developer

Introducing: Atlas Operator for Kubernetes

The MongoDB Enterprise Operator serves to automate and manage MongoDB clusters on self-managed infrastructure. While this integration has provided complete control over self-managed MongoDB deployments from a single Kubernetes control plane, we’re taking it a step further by extending this functionality to our fully-managed database—MongoDB Atlas. We’re excited to introduce the trial version of the Atlas Operator for Kubernetes. The Atlas Operator will allow you to manage all your MongoDB Atlas clusters without ever having to leave Kubernetes. Keep your workflow as seamless and optimized as possible by managing the lifecycle of your cloud-native applications from where you want most. With the trial version of this Atlas Operator, you can provision and deploy fully-managed MongoDB Atlas clusters on the cloud provider of your choice through Kubernetes. This provider is especially important for those seeking to unlock the power of multi-cloud with unique tools and services native to AWS, Google Cloud, and Azure without any added complexity to the data management experience. With this new Atlas Operator, you get the best of all clouds with multi-cloud clusters on Atlas , coupled with the freedom to run your entire stack anywhere, all while managed in one central location. The “trial version” simply means it has all the core functionality to provision fully-managed Atlas clusters, but the bells and whistles are yet to come. In addition to encapsulating core Atlas functionality, it ensures Kubernetes Secrets are created for each database user which allows for easier management of sensitive data. The Atlas Operator also allows you to create IP Bindings so your applications can securely access clusters. If you’re interested in using the trial version of the Atlas Operator today, follow our quickstart guide below to get started! Quickstart Below you’ll find the steps to create your first cluster in Atlas using the Atlas Operator. Note that you need to have a running Kubernetes cluster before deploying the Atlas Operator. Register/Login to Atlas and create API Keys for your Organization. This information together with the Organization ID will be used to configure the Atlas Operator access to Atlas. Deploy the Atlas Operator kubectl apply -f \ https://raw.githubusercontent.com/mongodb/mongodb-atlas-kubernetes/main/deploy/all-in-one.yaml Create a Secret containing connection information from step one. This Secret will be used by the Atlas Operator to connect to Atlas: kubectl create secret generic mongodb-atlas-operator-api-key \ --from-literal="orgId=<the_atlas_organization_id>" \ --from-literal="publicApiKey=<the_atlas_api_public_key>" \ --from-literal="privateApiKey=<the_atlas_api_private_key>" \ -n mongodb-atlas-system Create AtlasProject Custom Resource: cat <<EOF | kubectl apply -f - apiVersion: atlas.mongodb.com/v1 kind: AtlasProject metadata: name: my-project spec: name: Test Atlas Operator Project projectIpAccessList: - ipAddress: "0.0.0.0/0" comment: "Allowing access to database from everywhere (only for Demo!)" EOF Create AtlasCluster Custom Resource cat <<EOF | kubectl apply -f - apiVersion: atlas.mongodb.com/v1 kind: AtlasCluster metadata: name: my-atlas-cluster spec: name: "Test-cluster" projectRef: name: my-project providerSettings: instanceSizeName: M10 providerName: AWS regionName: US_EAST_1 EOF (You'll have to wait until the cluster is ready - "status" field shows "ready:true":) kubectl get atlasclusters my-atlas-cluster -o=jsonpath='{.status.conditions[?(@.type=="Ready")].status}' True Create a Secret for the password that will be used to log into Atlas Cluster Database kubectl create secret generic the-user-password \ --from-literal="password=P@@sword%" Create AtlasDatabaseUser Custom Resource (references the password Secret) cat <<EOF | kubectl apply -f - apiVersion: atlas.mongodb.com/v1 kind: AtlasDatabaseUser metadata: name: my-database-user spec: roles: - roleName: "readWriteAnyDatabase" databaseName: "admin" projectRef: name: my-project username: theuser passwordSecretRef: name: the-user-password EOF Shortly the Secret will be created by the Atlas Operator containing the data necessary to connect to the Atlas Cluster. You can mount it into your application Pod and read the connection strings from the file or from the environment variable. kubectl get secrets/test-atlas-operator-project-test-cluster-theuser \ -o=jsonpath="{.data.connectionString.standardSrv}} | base64 -d mongodb+srv://theuser:P%40%40sword%25@test-cluster.peqtm.mongodb.net Stay Tuned for More Be on the lookout for updates in future blog posts! The trial version of the MongoDB Atlas Operator is currently available on multiple marketplaces, but we’ll be looking to make enhancements in the near future. For more information, check out our MongoDB Atlas & Kubernetes GitHub page and our documentation .

April 8, 2021
Developer

MongoDB Connector for Apache Kafka 1.5 Available Now

Today, MongoDB has released version 1.5 of the MongoDB Connector for Apache Kafka! This article highlights some of the key features of this new release in addition to continuing to improve the overall quality & stability of the Connector . DeleteOne write model strategy When messages arrive on Kafka topics, the MongoDB Sink Connector reads them and by default will upsert them into the MongoDB cluster specified in the sink configuration. However, what if you didn’t want to always upsert them? This is where write strategies come in and provide you with the flexibility to define what you want to do with the document. While the concept of write strategies is not new to the connector, in this release there is a new write strategy available called DeleteOneBusinessKeyStrategy . This is useful for when a topic contains records identifying data that should be removed from a collection in the MongoDB sink. Consider the following: You run an online store selling fashionable face masks. As part of your architecture, the website sends orders to a Kafka topic, “web-orders” which upon message arrival kicks off a series of actions such as sending an email confirmation, and inserting the order details into an “Orders” collection in a MongoDB cluster. A sample Orders document: { _id: ObjectId("6053684f2fe69a6ad3fed028"), 'customer-id': 123, 'order-id': 100, order: { lineitem: 1, SKU: 'FACE1', quantity: 1 } } This process works great, however, when a customer cancels an order, we need to have another business process to update our inventory, send the cancellation, email and remove the order from our MongoDB sink. In this scenario a cancellation message is sent to another Kafka topic, “canceled-orders”. For messages in this topic, we don’t just want to upsert this into a collection, we want to read the message from the topic and use a field within the document to identify the documents to delete in the sink. For this example, let’s use the order-id key field and define a sink connector using the DeleteOneBusinessKeyStrategy as follows: "connector.class": "com.mongodb.kafka.connect.MongoSinkConnector", "topics":"FaceMaskWeb.OrderCancel", "connection.uri":"mongodb://mdb1", "database":"FaceMaskWeb", "collection":"Orders", "writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.DeleteOneBusinessKeyStrategy", "document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy", "document.id.strategy.partial.value.projection.type": "AllowList", "document.id.strategy.partial.value.projection.list": "order-id", "value.converter":"org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable":false, "document.id.strategy.overwrite.existing": true Now when messages arrive in the “FakeMaskWeb.OrderCancel” topic, the “order-id” field is used to delete documents in the Orders collection. For example, using the sample document above, if we put this value into the OrderCancel topic { “order-id”: 100 } It would cause the document in the Orders collection with order-id and value 100 to be deleted. For a complete list of write model strategies check out the MongoDB Kafka Connector Sink documentation . Qlik Replicate Qlik Replicate is recognized as an industry leader in data replication and ingestion. With this new release of the Connector, you can now replicate and stream heterogeneous data from data sources like Oracle, MySQL, PostGres and others to MongoDB via Kafka and the Qlik Replicate CDC handler . To configure the MongoDB Connector for Apache Kafka to consume Qlik Replicate CDC events, use “com.mongodb.kafka.connect.sink.cdc.qlik.rdbms.RdbmsHandler” as the value for the change data capture handler configuration parameter. The handler supports, insert, refresh, read, update and delete events. Errant Record Reporting Kafka Connect, the service which manages connectors that integrate with a Kafka deployment, has the ability to write records to a dead letter queue (DLQ) topic if those records could not be serialized or deserialized. Starting with Apache Kafka version 2.6, there was added support for error reporting within the sink connectors. This gives sink connectors the ability to send individual records to the DLQ if the connector deems the records to be invalid or problematic. For example, if you are projecting fields in the sink that do not exist in the kafka message or if your sink is expecting a JSON document and the message arrives in a different format. In these cases an error is written to the DLQ versus failing the connector. Various Improvements As with every release of the connector, we are constantly improving the quality and functionality. This release is no different. You’ll also see pipeline errors now showing up in the connect logs, and the sink connector can now be configured to write to the dead letter queue! Next Steps Download the latest MongoDB Connector for Apache Kafka 1.5 from the Confluent Hub ! Read the MongoDB Connector for Apache Kafka documentation . Questions/Need help with the connector? Ask the Community . Have a feature request? Provide Feedback or a file a JIRA .

April 7, 2021
Developer

Dive Deeper into Chart Data with New Drill-Down Capability

With the latest release of MongoDB Charts, you’re now able to dive deeper into the data that’s aggregated in your visualizations. At a high level, we generally create charts, graphs and visualizations of our data to answer questions about our business or products. Oftentimes, we need to “double click” on those visualizations to get insight into each individual data point that makes up the line, bar, column, etc. How the drill-down functionality works: Step 1: Right click on the data point you are interested in drilling down into Step 2: Click "show data for this item" Step 3: View the data in tabular or document format Each view can be better for different circumstances. For data without too many fields or no nested arrays, it might be quicker and more easily viewed in a table. On the other hand, the JSON view allows you to explore the structure of documents and click into arrays. Scenarios where more detailed information can help: Data visualization use cases are relatively broad spanning, but oftentimes they fall into 3 main categories: monitoring data, finding insights, and embedding analytics into applications. I’ll be focusing on the first two of these three as there are many different ways you could potentially build drilling-down into data via embedded charts. (Read more about our click events and embedded analytics ). For data or performance monitoring purposes , we're not speaking so much about the performance of your actual database and its underlying infrastructure, but the performance of the application or system built on top of the database. Imagine I have an application or website that takes reviews, if I build a chart like the one below where I want to easily see when an interaction hits a threshold that I want to dive deeper into, I now have the ability to quickly see the document that created that data point. This chart shows app ratings given after a user session in an app. For this example, we want to dive into any rating that was below a 3 (out of 5). This scatter plot shows I have two such ratings that cross that threshold. With the drill-down capability, I can easily see all the details captured in that user session. For finding new insights, let’s imagine I’m tracking how many transactions happen on my ecommerce site over time. In the column chart below, you can see I have purchases by month for the last year and a half (note, there’s a gap because this example is for a seasonal business!). Just by glancing at the chart, I can quickly see purchases have increased over time, and my in-app purchases have increased my overall sales. However, I want to see more about the documents that were aggregated to create those columns, so I can quickly see details about the transaction amount and location without needing to create another chart or dashboard filter. In both examples, I was able to answer a deeper level question that the original chart couldn’t answer on it’s own. We hope this new feature helps you and your stakeholders get more out of MongoDB Charts, regardless if you’re new to it or have been visualizing your Atlas data with it for months, if not years! If you haven’t tried Charts yet, you can get started for free by signing up for a MongoDB Atlas and deploying a free tier cluster.

April 6, 2021
Developer

Atlas as a Service

Many of our customers provide MongoDB as a service to their development teams, where developers can request a MongoDB database instance and receive a connection string and credentials in minutes. As those customers move to MongoDB Atlas , they are similarly interested in providing the same level of timely service to their developers. Atlas has a very powerful control plane for provisioning clusters. In large organizations that have thousands of developers, however, it is not always practical to give so many people direct access to that interface. The goal of this article is to show how the Atlas APIs can be used to provide MongoDB as a service when MongoDB is managed by Atlas. Specifically, we’ll demonstrate how an interface could be created that offers developers a set of choices for a MongoDB database instance. For simplicity, this represents how to provide developers with a list of memory and storage options to configure their cluster. Other options like cloud provider and region are abstracted away. We also demonstrate how to add labels to the Atlas clusters, which is a feature that the Atlas UI doesn’t support. For example, we’ve added a label for cluster description. Architecture Although the Atlas APIs could be called directly from the client interface, we elected to use a 3 tier architecture, the benefits being: We can control the extent of functionality to what is needed. We can simplify the APIs exposed to the front-end developers. We can more granulary secure the API endpoints. We could take advantage of other backend features such as Triggers , Twilio integration, etc. Naturally, we selected Realm to host the middle tier. Implementation Backend Atlas API The Atlas APIs are wrapped in a set of Realm Functions . For the most part, they all call the Atlas API as follows (this is getOneCluster): /* * Gets information about the requested cluster. If clusterName is empty, all clusters will be fetched. * See https://docs.atlas.mongodb.com/reference/api/clusters-get-one * */ exports = async function(username, password, projectID, clusterName) { const arg = { scheme: 'https', host: 'cloud.mongodb.com', path: 'api/atlas/v1.0/groups/' + projectID +'/clusters/' + clusterName, username: username, password: password, headers: {'Content-Type': ['application/json'], 'Accept-Encoding': ['bzip, deflate']}, digestAuth:true }; // The response body is a BSON.Binary object. Parse it and return. response = await context.http.get(arg); return EJSON.parse(response.body.text()); }; You can see each function’s source on GitHub . MiniAtlas API The next step was to expose the functions as endpoints that a frontend could use. Alternatively, we could have called the functions using the Realm Web SDK , but we elected to stick with the more familiar REST protocol for our frontend developers. Using Realm Third-Party Services , we developed the following 6 endpoints: table, th, td { border: 1px solid black; border-collapse: collapse; } API Method Type Endpoint Get Clusters GET /getClusters Create a Cluster POST /getClusters Get Cluster State GET /getClusterState?clusterName:cn Modify a Cluster PATCH /modifyCluster Pause or Resume Cluster POST /pauseCluster Delete a Cluster DELETE /deleteCluster?clusterName:cn Here’s the source for getClusters. Note is pulls the username and password from the Values & Secrets : /* * GET getClusters * * Query Parameters * * None * * Response - Currently all values documented at https://docs.atlas.mongodb.com/reference/api/clusters-get-all/ * */ exports = async function(payload, response) { var results = []; const username = context.values.get("username"); const password = context.values.get("apiKey"); projectID = context.values.get("projectID"); // Sending an empty clusterName will return all clusters. var clusterName = ''; response = await context.functions.execute("getOneCluster", username, password, projectID, clusterName); results = response.results; return results; }; You can see each webhook’s source on GitHub . When the webhook is saved, a Webhook URL is generated, which is the endpoint for the API: API Endpoint Security Only authenticated users can execute the API endpoints. The caller must include an Authorization header containing a valid user id, which the endpoint passes through this script function: exports = function(payload) { const headers = context.request.requestHeaders const { Authorization } = headers const user_id = Authorization.toString().replace(/^Bearer/, '') return user_id }; MongoDB Realm includes several built-in authentication providers including anonymous users, email/password combinations, API keys , and OAuth 2.0 through Facebook , Google , and Apple ID . For this exercise, we elected to use Google OAuth , primarily because it’s already integrated with our SSO provider here at MongoDB. The choice of provider isn’t important. Whatever provider or providers are enabled will generate an associated user id that can be used to authenticate access to the APIs. Frontend The frontend is implemented in JQuery and hosted on Realm . Authentication The client uses the MongoDB Stitch Browser SDK to prompt the user to log into Google (if not already logged in) and sets the users Google credentials in the StitchAppClient . let credential = new stitch.GoogleRedirectCredential(); client.auth.loginWithRedirect(credential); Then, the user Id required to be sent in the API call to the backend can be retrieved from the StitchAppClient as follows: let userId = client.auth.authInfo.userId; And set in the header when calling the API. Here’s an example calling the createCluster API: export const createCluster = (uid, data) => { let url = `${baseURL}/createCluster` const params = { method: "post", headers: { "Content-Type": "application/json;charset=utf-8", ...(uid && { Authorization: uid }) }, ...(data && { body: JSON.stringify(data) }) } return fetch(url, params) .then(handleErrors) .then(response => response.json()) .catch(error => console.log(error) ); }; You can see all the api calls in webhooks.js . Tips We had great success using Postman Team Workspaces to share and validate the backend APIs. Conclusions This prototype was created to demonstrate what’s possible, which by now you hopefully realize is anything! The guts of the solution are here - how you wish to extend it is up to you. Learn more about the MongoDB Atlas APIs .

March 2, 2021
Developer

MongoDB Connector for Apache Kafka 1.4 Available Now

As businesses continue to embrace event-driven architectures and tackle Big Data opportunities, companies are finding great success integrating Apache Kafka and MongoDB. These two complementary technologies provide the power and flexibility to solve these large scale challenges. Today, MongoDB continues to invest in the MongoDB Connector for Apache Kafka releasing version 1.4! Over the past few months, we’ve been collecting feedback and learning how to best help our customers integrate MongoDB within the Apache Kafka ecosystem. This article highlights some of the key features of this new release. Selective Replication in MongoDB Being able to track just the data that has changed is an important use case in many solutions. Change Data Capture (CDC) has been available on the sink since the original version of the connector. However, up until version 1.4, the source for CDC events could only be sourced from MongoDB via the Debezium MongoDB Connector. WIth the latest release you can specify the MongoDB Change Stream Handler on the sink to read and replay MongoDB events sourced from MongoDB using the MongoDB Connector for Apache Kafka. This feature enables you to record insert, update, and delete activities on a namespace in MongoDB and replay them on a destination MongoDB cluster. In effect you have a lightweight way to perform basic replication of MongoDB data via Kafka. Let’s dive in and see what is happening under the hood. Recall that when the connector is used as a source to MongoDB, it starts a change stream on a specific namespace. Depending on how you configure the source connector, documents are written into a Kafka topic based on this namespace and pipeline that match your criteria. These documents are by default in the change stream event format . Here is a partial message in the Kafka topic that was generated from the following statement: db.Source.insert({proclaim: "Hello World!"}); { "schema": { "type": "string", "optional": false }, "payload": { "_id": { "_data": "82600B38...." }, "operationType": "insert", "clusterTime": { "$timestamp": { "t": 1611348141, "i": 2 } }, "fullDocument": { "_id": { "$oid": "600b38ad6011ef6265c3acd1" }, "proclaim": "Hello World!" }, "ns": { "db": "Tutorial3", "coll": "Source" }, "documentKey": { "_id": { "$oid": "600b38ad6011ef6265c3acd1" } } } } Now that our change stream message is in the Kafka topic, we can use the connector as a sink to read the stream of messages and replay them at the destination cluster. To set up the sink to consume these events, set the “change.data.capture.handler" to the new com.mongodb.kafka.connect.sink.cdc.mongodb.ChangeStreamHandler property. Notice that one of the fields is “operationType”. The sink connector will only support insert, update and delete operations on the namespace and does not support actions like creation of database objects such as users, namespaces, indexes, views, and other metadata that occurs in more traditional replication solutions. In addition this capability is not intended as a replacement for a full featured replication system as it can not guarantee transactional consistency between the two clusters. That said, if all you are looking to do is move data and can accept its lack of consistency then you have a simple solution using the new ChangeStreamHandler. To work through a tutorial on this new feature, check out Tutorial 3 of the MongoDB Connector for Apache Kafka Tutorials in GitHub . Dynamic Namespace Mapping When we use the MongoDB connector as a sink we take data that resides on a Kafka Topic and insert it into a collection. Prior to 1.4, once this mapping is defined it isn’t possible to route topic data to another collection. In this release we added the ability to dynamically map a namespace to the contents of the kafka topic message. For example, consider a Kafka Topic “Customers.Orders” that contains the following messages: {"orderid":1,"country":"ES"} {"orderid":2,"country":"US"} We would like these messages to be placed in their own collection based upon the country value. Thus, the message with the field “orderid” that has a value of 1 will be copied in a collection called, “ES”. Likewise, the message with the field “orderid” that has a value of 2 will be copied to a collection called, “US”. To see how we configure this scenario, we will define a sink using the new namespace.mapper property configured with a value of “ com.mongodb.kafka.connect.sink.namespace.mapping.FieldPathNamespaceMapper ”. Using this mapper, we can use a key or value field to determine the database and collection respectively. In our example above let’s define our config using the value of the country field as the collection name to sink to: '{"name": "mongo-dynamic-sink", "config": { "connector.class":"com.mongodb.kafka.connect.MongoSinkConnector", "topics":"Customers.Orders", "connection.uri":"mongodb://mongo1:27017,mongo2:27017,mongo3:27017", "database":"Orders", "collection":"Other" "value.converter":"org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable":"false", "namespace.mapper":"com.mongodb.kafka.connect.sink.namespace.mapping.FieldPathNamespaceMapper", "namespace.mapper.value.collection.field":"country" }} Messages that do not have a country value will by default be written to the namespace defined in the configuration just like they would have been without the mapping. However, If you want messages that do not conform to the map to generate an error simply set the property namespace.mapper.error.if.invalid to true. This will raise an error and stop the connector when messages can not be mapped to a namespace due to missing fields or fields that are not strings. If you’d like to have more control over the namespace you can use the new “getNamespace” method of the interface com.mongodb.kafka.connect.sink.namespace.mapping.NamespaceMapper . Implementations of this method can implement more complex business rules and can access the SinkRecord or SinkDocument as part of the logic to determine the destination namespace. Dynamic Topic Mapping Once the source connector is configured, change stream events flow from the namespace defined in the connector to a Kafka Topic. The name of the Kafka Topic is made up of three configuration parameters: topic.prefix, database and collection. For example, if you had as part of your source connector configuration: “topic.prefix”:”Stocks”, “database”:”Customers”, “collection”:”Orders” The Kafka topic that would be created would be “Stocks.Customers.Orders”. However, what if you didn’t always want the events in the Orders collection to always go to this specific topic? What if you wanted to determine at run-time which topic a specific message should be routed to? In 1.4 you can now specify a namespace map that defines which kafka topic a namespace should be written to. For example, consider the following map: {"Customers": "CustomerTopic", "Customers.Orders": "Orders"} This will map all change stream documents from the Customers database to CustomerTopic.<collectionName> apart from any documents from the Customers.Orders namespace which map to the Orders topic. If you need to use complex business logic to determine the route, you can implement the getTopic method in the new TopicMapper class to handle this mapping logic. Also note that 1.4 introduced a topic.suffix configuration property in addition to the topic.prefix. Using our example above, you can configure “topic.prefix”:”Stocks”, “database”:”Customers”, “collection”:”Orders”, topics.suffix:”US” This will define the topic to write to as “Stocks.Customers.Orders.US” Next Steps Download the latest MongoDB Connector for Apache Kafka 1.4 from the Confluent Hub ! Read the MongoDB Connector for Apache Kafka documentation Questions/Need help with the connector? Ask the Community Have a feature request? Provide Feedback or a file a JIRA

February 9, 2021
Developer

4 Steps to Success: From Surviving with Legacy Systems to Thriving with MongoDB

Legacy data migrations imply a change in the status quo. More often than not, when an organization finally undertakes a thorough analysis of its technology landscape, it arrives at the same decision: to do nothing. It is an understandably daunting task to upgrade or replace 20+ year-old applications and their database counterparts. But there are good reasons, beyond the tri-annual hardware upgrade, to propel those legacy monoliths of the 1990s into the 21st century. Companies that prevailed—and even triumphed—in the volatile spring of 2020 were those that transitioned to a more flexible usage model and were therefore able to adjust their business models more rapidly and reliably. MongoDB’s client, Sanoma, was one of the winners. Sanoma was able to scale from 3,000 to 150,000 users within 24 hours, without any service interruption. Innovation and modernization go hand in hand. However, while modernization can sadly occur without innovation, the opposite is simply not possible. A bit of history The concept of bringing data together through online data layers (ODL) or operational data stores (ODS) isn't new or specific to MongoDB. Accessing legacy systems, bringing data together, and making it all more easily accessible was a common goal even 20 years ago, and led to the search for the golden source of truth (i.e. the definitive master source for any given entity). This search proved elusive early on due to the hurdles involved with bringing data from diverse, over-structured relational constructs to a sole target called Operational Data Store (ODS) or Online Data Layer (ODL). The industry’s first attempts began with Object-oriented databases, then with the dead end of XML data stores. (In my personal opinion, Xquery and Xpath were never meant for real developers). After both endeavors failed, then came the wave of Apache efforts I like to call “Hadoop Solves the Planet,” in which companies dumped all their structured data onto a big-data treasure trove. Unfortunately, this resulted in a data desert rather than the data lake everybody was hoping for, since organizations then had to scramble to build a concept for secondary indexing, data dictionaries, and more, on top of having to rebuild the sensible structures they lost. In the 2010s, the document model, in conjunction with JSON notation , emerged as the new de facto standard. MongoDB release 3.x introduced the combination of ACID (atomicity, consistency, isolation, durability) and compliance with a broad range of data types (in BSON, for those in the know). Soon, the MongoDB team started implementing additional features of relational heritage: secondary indexing, ACID transactions, aggregations and manipulations of data in site, materialized views, joins, unions... the list goes on. Where we are now MongoDB documents can be enriched through different means and channels without touching the content — the consistency of all data and data lineage is implicitly guaranteed. A typical example is the extraction of a delivery address through a supply chain application and a billing address through an enterprise resource planning system. In many cases, those two systems have different requirements. MongoDB documents simply keep both instantiations intact and can even hold multiples of each attached to one single client profile without the need to complete loads and transformations, foreign keys, and all the other ingredients of the relational past. MongoDB simply adds and leverages other sources without destroying their context. MongoDB delivers an ODS and ODL experience while streamlining the time-consuming journey of replacing legacy application code.The data platform of true modernization and innovation has arrived! How your company can get here The entire journey can be summarized in four simple steps: Analysis: Where do I start my data journey to drive the fastest value? Scaffolding: How do I get my data out of the existing platform and bridge it to the new platform? Coding: How do I enter the world of adjusting and adapting my applications landscape? Innovation: Which are the easiest targets for my company to start achieving true innovation? The following sections answer these four questions and provide you with a starting point for your journey toward a new and improved solution landscape. Step 1: Analysis of your existing solution landscape Data Provisioning Data provisioning—the act of bringing data from source system(s) to target system—is actually the easy part of this step. Opinions may vary as to the very best approach, but most existing models for streaming data in real time make the process elegant and allow for a business-driven decision from real-time replication on one end to communicate with the batch of .CSV files on the other end. Application onboarding More exciting is the application onboarding phase, inclusive of the selection and design of initial data domains. Here, simple mechanisms derived from the classic priority concepts can assist—and yes, they existed long before computers. Data domains already exist in objects in the business logic represented through their objects in the various programming languages. But even the most talented application developer deals with constant changes which leads to compromises in those objects and can obfuscate the original clarity in their design so the objects may hide in plain sight. Unearthing those gems and aligning them to the ODS is the most important step towards true legacy modernization. The most simple solution is actually the most practical one: load an object with the existing software and persist it into a MongoDB collection. The effort of persisting the object results in two lines of code that can be easily added. The location of the two lines of code (first line one opens connection to database; second line one persists the object) does not matter as long as it is in a place after the object is built out. This is the first time you will see the beauty of MongoDB and MQL at work. You really have to do nothing for the object itself—e.g. no decomposition or abstraction layer. MongoDB takes care of it for you. When looking at the object in the MongoDB database, e.g. using MongoDB Compass, you will realize that it already looks a lot like the domain object you wanted. The actual task to map objects to domains, or subset of domains, is now mostly driven by the application use case. Tip: How to leverage application mapping to accelerate onboarding In the model below, which was taken from the financial industry but can easily be adopted across industries, we identify the data domains in various applications and map their behavior to the effort it takes to locate them as well as their importance to the app. First, each domain gets a rating for its object complexity, where “complexity” is defined by the implementation team. This is similar to the concept of “ poker ” in a development sprint. Second, each data domain must be located in the application content. Then, it’s tally time. As we can see in the example above, the concept of schedules looks quite easy but is superseded by the client profiles which have a touch more application context (spoiler: those always come out on top). Based on the combination of complexity and the number of data domains affecting an application, we can now easily achieve the model below. Agile is your friend and, assuming a certain “point capacity,” the applications fall into place for their conversion schedule in a quite neutral fashion. The development team will then start with low hanging fruit. As soon as application 1, 6 and 7 are ported, we’re in business in a new modern landscape. Along the journey, the domains will get cleaned up naturally as we do not have the static corsage of the RDBMS table designs. Step 2: Scaffolding Scaffolding is the art of building a bridge that can hold people as they cross it, then immediately dissipate once they step off. But for that critical time, it needs to hold. The same is true for the connectivity between a legacy system and a new data platform. Starting with the first sprint, we have data residing in the MongoDB data platform. If the data is limited to new applications and resides exclusively in MongoDB, nothing needs to be done. However, as shown in the client profiles example above, there may be dependencies to consider. The synchronization between the legacy database and the new MongoDB platform can be easily arranged using microservices and the same concepts used for the initial loading of data. Synchronization can also be achieved through “the gate” if only READ data is needed during the first sprint, or if you’re already dealing with WRITE and the requirement to synchronize those writes back to a legacy system. Streaming: A streaming based solution is a great option for uni-directional operations that allow read only in the most simple way. Service: Selecting a simple, tiny microservice is a good option for the use case where data needs to be selectively written. It works using the document model on the MongoDB side, but can still push necessary updates back to the legacy system, and vice-versa. The great news is that this service potentially exists already, as it requires nothing more than using the old database interface from the legacy application on one side and the new, easy-to-digest JSON document format on the MongoDB side. If both databases are ACID-compliant, any transaction is automatically treated as a normal application interaction on both sides. “Y-Loader”: Another option is a true “Y-loader,” where all transactions are written in sync to both databases in parallel, and the actual transaction is only considered committed when both systems report their commit and completion. Simple two-phase protocols (write to both, wait five seconds, read both to validate and, if in sync, commit to application) are available as ready-made services through various distributed transaction coordinators, but often it’s easier to use the existing data access in the application. In that case, the new data path to MongoDB is in parallel, and a simple redundant checkpoint (which the application logic would have had for the legacy path anyway) is expanded for this purpose. Step 3: Coding The coding with the new domain data model, as well as the MongoDB flexible document model as the underlying base, will immediately impact the coding for the business logic and application development. The operative word is immediately. As the data gets unlocked with the initial persistence of the code object to the MongoDB collection, the developer is simultaneously able to code based on business requirements. Developers will no longer be hindered by reference and requirements of object mappers. As the objects are represented through the MongoDB idiomatic drivers, each programming object resides directly in the data collection; in reverse, any changes to the business logic object will be naturally represented—code-free—in the MongoDB collection. A single blog post can't resolve all open questions and edge cases. Each application, client, and data interface is unique. Databases possess historic technical debt and implicit assumptions that become lost in generations of developers over time. “Do not touch this section—not sure what it does but last time we tried all hell broke loose…” is often-heard advice around the organizational water cooler. But the key lesson? There are many different templates available and very simple methods of quickly taking the lead to significant success. For example, a German client, who was stuck in a combination of IBM DB2 (mainframe and distributed) with a significant Hadoop footprint, was amazed when they realized they could “lift” their data one microservice at a time. This resulted in business requirements shifting from “impossible to do” for some requested queries to “completed in under one second” within a single week of a proof-of-concept. This is no exception. Cases and changes like these are made daily, reinforcing Mark Twain’s sage advice that “The secret of getting ahead is getting started." Step 4: Innovation As the migration from the legacy environment continues, innovation will be the new focus. The unlocking of previously siloed data allows immediate coupling of real-time data with machine learning platforms for various purposes: e.g. scoring for financial decision-making, personalization for retail, or optimization of production processes in the IOT context. New applications and solutions can easily be created on top of the unleashed data, even with various programming languages, direct real-time dashboards created with MongoDB Charts, and different paradigms (again, MongoDB’s idiomatic drivers do magic!) At this time, the discussion with the product owners in your squads and tribes (trying to be real modern here) begins with the question“What is the highest priority component to change?” and “What function is required to enable this change?” Is it worth waiting much longer? The real question is: why did we all not start sooner? It’s time to begin integrating the list of features you always dreamed of having, but never dared to pursue. The MongoDB team is here to help you get started. Reach out today and let’s discuss the best path forward. To learn more about modernizing to MongoDB, click here .

January 27, 2021
Developer

What’s new in MongoDB for VS Code

A few months ago, we introduced MongoDB for VS Code , an extension to quickly connect to MongoDB and Atlas and work with your data right inside your code editor. Since then, over 85,000 of you have installed the extension, and based on your feedback, we improved the extension quite a bit and released a few new versions that added new functionality and extended the existing one. With this week’s release, we close the loop on what you can do with MongoDB for VS Code: You can choose the database and collection you want to query Make sure that you have the right indexes in place (an create new indexes if you don't) Search for documents with playgrounds Update your documents directly in the editor All of this with a workflow that is well integrated with the native VS Code functionality and its shortcut. Index View When you work with your data in MongoDB, no matter if you are building a cool application or if you are writing a Playground for an analytics query, you want to make sure your queries are covered by the right indexes. In MongoDB for VS Code that’s extremely easy to do: just select the collection you want in the tree view and all the information you need is right there. And if you see that the index you need is missing, we can prefill a playground with the command you need to create it. Quick Access to Document Search Once the right indexes are in place, jumping into a prefilled playground to find documents is just one click away. From there, you can just customize your query to find the documents you need. Results of a query – or of any playground, really – can also be saved into a file for later use or to share it with your colleagues. Just hit Ctrl/Cmd+S like you’d normally do with any file in VS Code and you are done. Edit Documents After you’ve found the documents you were looking for, you can open each one in its own editor view, edit it and save it back into MongoDB. Document editing was our most requested feature, and we are happy to have finally shipped it in our most recent release of MongoDB for VS Code and to have it built it in a way that fits within the natural VS Code user experience. VS Code Playgrounds + Node.js API + NPM Modules What I described above, is a normal flow when you work with a database: pick your db and collection, find the documents you are interested in, edit them and finally save the changes back to the database. MongoDB playgrounds are much more powerful than that though. First of all, you can use them to run any command that you’d run in the MongoDB shell: this means that playgrounds are effectively a shell replacement and a great way to write, edit and save long shell scripts in a full-featured editor. Second, in playgrounds, you have the entire Node.js API available to you and with a bit more work you can even require any module from NPM inside your playground code . Here’s how it’s done. Go to VS Code’s local settings folder ($HOME/.vscode on Linux/macOS or %USERPROFILE%/.vscode on Windows) and install a module from NPM. We’ll use cowsay as an example: $ npm i cowsay Now, go back to VS Code, connect to a MongoDB server or to an Atlas cluster and create the following playground: const cowsay = require('cowsay'); cowsay.say({ text : db.version() }); If everything works as expected, a cute cow should tell you what version of the server you are currently using. As you can see, the result of a playground does not have to be JSON documents. It can really be anything, and you can use any node modules out there to format it in the way you need it. This coupled with other VS Code extensions (one that I like a lot is Charts.js Preview for example) can make VS Code a powerful tool to query and analyze your data stored in MongoDB. Try it Now! If you are a VS Code user, getting started with MongoDB for VS Code is easy: Install the extension from the marketplace Get a free Atlas cluster if you don’t have a MongoDB server already Connect to it and start doing cool stuff with playgrounds You can find more information about MongoDB for VS Code and all its features in the documentation . Is there anything else you’d like to see in MongoDB for VS Code? Join in the discussion at the MongoDB Community Forums , and share your ideas using the MongoDB Feedback Engine .

January 26, 2021
Developer

Add Interactivity to Your Embedded Analytics with Click Events

MongoDB Charts’ data visualizations can now become more interactive, so users and stakeholders can dive deeper into the insights they care more about. That’s possible with a new feature currently in beta with support for most Charts types: click events. A click event in the Charts embedding SDK is simply a notification that a user clicked on a chart. That click could be anything: They might have clicked on a bar in a bar chart, a chart’s legend, or even empty white space within the chart. Web developers can subscribe to these events and create different functionality depending on where the user clicked. Why might you want to enhance your app or embedded analytics workflow with click event data? Click-event data opens up a wide range of possibilities. Here are a couple of examples, inspired by various Charts users who’ve been telling us how they’d like to use click-event data. Open up another chart, based on a user clicking on data within a chart: A logistics company has a bar chart that shows pending orders at an aggregate level per region, and they want to see more detail on pending orders for a specific region. They create a click event handler in their application that opens up a new chart with pending orders per supplier, based on the region selected in the aggregate chart. Filtering the other charts on a dashboard when a series or data point on a single chart is clicked: A retail clothing company has a dashboard with various shopping cart information such as sales, orders processed, and returns, for their portfolio of products. The head of outerwear sales only wants to see data for the “outerwear” category of products, so they click on the “outerwear” series within a bar chart. The rest of the dashboard can adapt so that it shows only information relevant to outerwear. The example below is created from one of our sample data sets. We created two charts in a single app, tied to the sample movie data set that every Atlas user can access. On the left is a stacked bar chart with category level data that includes genre and decade. On the right is a table chart that shows each individual movie within a category. Clicking on a specific category in the bar chart updates the movies shown in the table chart. How can you get started with click events of embedded charts? If you haven’t yet used the embedding SDK for MongoDB Charts, you’ll want to familiarize yourself with the docs , consider watching this video tutorial , and access the SDK via the Charts GitHub repository . Regardless if you’re new to using the SDK or have experience with it, it’s important to note that you will need to use the @mongodb-js/charts-embed-dom@beta tagged version of the SDK to have access to the click events functionality while it’s in beta. There are two examples specifically for click events in the repository: click-events-basic and click-events-filtering . If you just want to explore and test out the functionality, you can play around with them in the following sandboxes using codesandbox.io: Click events basic sandbox Click events filtering sandbox Here’s a snapshot of the data surfaced in a click event that is available for developers to use in their apps. In this example I clicked on the yellow section of the top bar in the Movie Genres chart above. Note how it includes details of the click coordinates, the type and role of the clicked mark, and the corresponding chart data for the mark. { "chartId": "90a8fe84-dd27-4d53-a3fc-0e40392685dd", "event": { "type": "click", "altKey": false, "ctrlKey": false, "shiftKey": false, "metaKey": false, "offsetX": 383, "offsetY": 15, "clientX": 403, "clientY": 99, "pageX": 403, "pageY": 99, "screenX": 756, "screenY": 217 }, "data": { "y": { "label": "Genre", "value": "Drama" }, "x": { "label": "# Movies", "value": 3255 }, "color": { "label": "Decade", "value": "2010 - 2020", "lowerBound": 2010, "upperBound": 2020 } }, "target": { "type": "rect", "role": "mark", "fill": "#F0D175" }, "apiVersion": 1 } Whether you’re an avid user or new to MongoDB Charts, we hope you consider taking advantage of the new click event capability to increase the interactivity of Charts. It’s in beta because there is more functionality still to come. It has yet to be released for a few chart types: geospatial, table, top item, word cloud, and number charts. On that note, we’d love to hear your thoughts through the MongoDB Feedback Engine . If you haven’t tried Charts yet, you can get started for free by signing up for a MongoDB Atlas and deploying a free tier cluster.

January 20, 2021
Developer

Ready to get Started with MongoDB Atlas?

Start Free