MongoDB Developer

Coding with MongoDB - news for developers, tips and deep dives

MongoDB Connector for Apache Kafka 1.4 Available Now

As businesses continue to embrace event-driven architectures and tackle Big Data opportunities, companies are finding great success integrating Apache Kafka and MongoDB. These two complementary technologies provide the power and flexibility to solve these large scale challenges. Today, MongoDB continues to invest in the MongoDB Connector for Apache Kafka releasing version 1.4! Over the past few months, we’ve been collecting feedback and learning how to best help our customers integrate MongoDB within the Apache Kafka ecosystem. This article highlights some of the key features of this new release. Selective Replication in MongoDB Being able to track just the data that has changed is an important use case in many solutions. Change Data Capture (CDC) has been available on the sink since the original version of the connector. However, up until version 1.4, the source for CDC events could only be sourced from MongoDB via the Debezium MongoDB Connector. WIth the latest release you can specify the MongoDB Change Stream Handler on the sink to read and replay MongoDB events sourced from MongoDB using the MongoDB Connector for Apache Kafka. This feature enables you to record insert, update, and delete activities on a namespace in MongoDB and replay them on a destination MongoDB cluster. In effect you have a lightweight way to perform basic replication of MongoDB data via Kafka. Let’s dive in and see what is happening under the hood. Recall that when the connector is used as a source to MongoDB, it starts a change stream on a specific namespace. Depending on how you configure the source connector, documents are written into a Kafka topic based on this namespace and pipeline that match your criteria. These documents are by default in the change stream event format . Here is a partial message in the Kafka topic that was generated from the following statement: db.Source.insert({proclaim: "Hello World!"}); { "schema": { "type": "string", "optional": false }, "payload": { "_id": { "_data": "82600B38...." }, "operationType": "insert", "clusterTime": { "$timestamp": { "t": 1611348141, "i": 2 } }, "fullDocument": { "_id": { "$oid": "600b38ad6011ef6265c3acd1" }, "proclaim": "Hello World!" }, "ns": { "db": "Tutorial3", "coll": "Source" }, "documentKey": { "_id": { "$oid": "600b38ad6011ef6265c3acd1" } } } } Now that our change stream message is in the Kafka topic, we can use the connector as a sink to read the stream of messages and replay them at the destination cluster. To set up the sink to consume these events, set the “" to the new com.mongodb.kafka.connect.sink.cdc.mongodb.ChangeStreamHandler property. Notice that one of the fields is “operationType”. The sink connector will only support insert, update and delete operations on the namespace and does not support actions like creation of database objects such as users, namespaces, indexes, views, and other metadata that occurs in more traditional replication solutions. In addition this capability is not intended as a replacement for a full featured replication system as it can not guarantee transactional consistency between the two clusters. That said, if all you are looking to do is move data and can accept its lack of consistency then you have a simple solution using the new ChangeStreamHandler. To work through a tutorial on this new feature, check out Tutorial 3 of the MongoDB Connector for Apache Kafka Tutorials in GitHub . Dynamic Namespace Mapping When we use the MongoDB connector as a sink we take data that resides on a Kafka Topic and insert it into a collection. Prior to 1.4, once this mapping is defined it isn’t possible to route topic data to another collection. In this release we added the ability to dynamically map a namespace to the contents of the kafka topic message. For example, consider a Kafka Topic “Customers.Orders” that contains the following messages: {"orderid":1,"country":"ES"} {"orderid":2,"country":"US"} We would like these messages to be placed in their own collection based upon the country value. Thus, the message with the field “orderid” that has a value of 1 will be copied in a collection called, “ES”. Likewise, the message with the field “orderid” that has a value of 2 will be copied to a collection called, “US”. To see how we configure this scenario, we will define a sink using the new namespace.mapper property configured with a value of “ com.mongodb.kafka.connect.sink.namespace.mapping.FieldPathNamespaceMapper ”. Using this mapper, we can use a key or value field to determine the database and collection respectively. In our example above let’s define our config using the value of the country field as the collection name to sink to: '{"name": "mongo-dynamic-sink", "config": { "connector.class":"com.mongodb.kafka.connect.MongoSinkConnector", "topics":"Customers.Orders", "connection.uri":"mongodb://mongo1:27017,mongo2:27017,mongo3:27017", "database":"Orders", "collection":"Other" "value.converter":"org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable":"false", "namespace.mapper":"com.mongodb.kafka.connect.sink.namespace.mapping.FieldPathNamespaceMapper", "namespace.mapper.value.collection.field":"country" }} Messages that do not have a country value will by default be written to the namespace defined in the configuration just like they would have been without the mapping. However, If you want messages that do not conform to the map to generate an error simply set the property namespace.mapper.error.if.invalid to true. This will raise an error and stop the connector when messages can not be mapped to a namespace due to missing fields or fields that are not strings. If you’d like to have more control over the namespace you can use the new “getNamespace” method of the interface com.mongodb.kafka.connect.sink.namespace.mapping.NamespaceMapper . Implementations of this method can implement more complex business rules and can access the SinkRecord or SinkDocument as part of the logic to determine the destination namespace. Dynamic Topic Mapping Once the source connector is configured, change stream events flow from the namespace defined in the connector to a Kafka Topic. The name of the Kafka Topic is made up of three configuration parameters: topic.prefix, database and collection. For example, if you had as part of your source connector configuration: “topic.prefix”:”Stocks”, “database”:”Customers”, “collection”:”Orders” The Kafka topic that would be created would be “Stocks.Customers.Orders”. However, what if you didn’t always want the events in the Orders collection to always go to this specific topic? What if you wanted to determine at run-time which topic a specific message should be routed to? In 1.4 you can now specify a namespace map that defines which kafka topic a namespace should be written to. For example, consider the following map: {"Customers": "CustomerTopic", "Customers.Orders": "Orders"} This will map all change stream documents from the Customers database to CustomerTopic.<collectionName> apart from any documents from the Customers.Orders namespace which map to the Orders topic. If you need to use complex business logic to determine the route, you can implement the getTopic method in the new TopicMapper class to handle this mapping logic. Also note that 1.4 introduced a topic.suffix configuration property in addition to the topic.prefix. Using our example above, you can configure “topic.prefix”:”Stocks”, “database”:”Customers”, “collection”:”Orders”, topics.suffix:”US” This will define the topic to write to as “Stocks.Customers.Orders.US” Next Steps Download the latest MongoDB Connector for Apache Kafka 1.4 from the Confluent Hub ! Read the MongoDB Connector for Apache Kafka documentation Questions/Need help with the connector? Ask the Community Have a feature request? Provide Feedback or a file a JIRA

February 9, 2021

4 Steps to Success: From Surviving with Legacy Systems to Thriving with MongoDB

Legacy data migrations imply a change in the status quo. More often than not, when an organization finally undertakes a thorough analysis of its technology landscape, it arrives at the same decision: to do nothing. It is an understandably daunting task to upgrade or replace 20+ year-old applications and their database counterparts. But there are good reasons, beyond the tri-annual hardware upgrade, to propel those legacy monoliths of the 1990s into the 21st century. Companies that prevailed—and even triumphed—in the volatile spring of 2020 were those that transitioned to a more flexible usage model and were therefore able to adjust their business models more rapidly and reliably. MongoDB’s client, Sanoma, was one of the winners. Sanoma was able to scale from 3,000 to 150,000 users within 24 hours, without any service interruption. Innovation and modernization go hand in hand. However, while modernization can sadly occur without innovation, the opposite is simply not possible. A bit of history The concept of bringing data together through online data layers (ODL) or operational data stores (ODS) isn't new or specific to MongoDB. Accessing legacy systems, bringing data together, and making it all more easily accessible was a common goal even 20 years ago, and led to the search for the golden source of truth (i.e. the definitive master source for any given entity). This search proved elusive early on due to the hurdles involved with bringing data from diverse, over-structured relational constructs to a sole target called Operational Data Store (ODS) or Online Data Layer (ODL). The industry’s first attempts began with Object-oriented databases, then with the dead end of XML data stores. (In my personal opinion, Xquery and Xpath were never meant for real developers). After both endeavors failed, then came the wave of Apache efforts I like to call “Hadoop Solves the Planet,” in which companies dumped all their structured data onto a big-data treasure trove. Unfortunately, this resulted in a data desert rather than the data lake everybody was hoping for, since organizations then had to scramble to build a concept for secondary indexing, data dictionaries, and more, on top of having to rebuild the sensible structures they lost. In the 2010s, the document model, in conjunction with JSON notation , emerged as the new de facto standard. MongoDB release 3.x introduced the combination of ACID (atomicity, consistency, isolation, durability) and compliance with a broad range of data types (in BSON, for those in the know). Soon, the MongoDB team started implementing additional features of relational heritage: secondary indexing, ACID transactions, aggregations and manipulations of data in site, materialized views, joins, unions... the list goes on. Where we are now MongoDB documents can be enriched through different means and channels without touching the content — the consistency of all data and data lineage is implicitly guaranteed. A typical example is the extraction of a delivery address through a supply chain application and a billing address through an enterprise resource planning system. In many cases, those two systems have different requirements. MongoDB documents simply keep both instantiations intact and can even hold multiples of each attached to one single client profile without the need to complete loads and transformations, foreign keys, and all the other ingredients of the relational past. MongoDB simply adds and leverages other sources without destroying their context. MongoDB delivers an ODS and ODL experience while streamlining the time-consuming journey of replacing legacy application code.The data platform of true modernization and innovation has arrived! How your company can get here The entire journey can be summarized in four simple steps: Analysis: Where do I start my data journey to drive the fastest value? Scaffolding: How do I get my data out of the existing platform and bridge it to the new platform? Coding: How do I enter the world of adjusting and adapting my applications landscape? Innovation: Which are the easiest targets for my company to start achieving true innovation? The following sections answer these four questions and provide you with a starting point for your journey toward a new and improved solution landscape. Step 1: Analysis of your existing solution landscape Data Provisioning Data provisioning—the act of bringing data from source system(s) to target system—is actually the easy part of this step. Opinions may vary as to the very best approach, but most existing models for streaming data in real time make the process elegant and allow for a business-driven decision from real-time replication on one end to communicate with the batch of .CSV files on the other end. Application onboarding More exciting is the application onboarding phase, inclusive of the selection and design of initial data domains. Here, simple mechanisms derived from the classic priority concepts can assist—and yes, they existed long before computers. Data domains already exist in objects in the business logic represented through their objects in the various programming languages. But even the most talented application developer deals with constant changes which leads to compromises in those objects and can obfuscate the original clarity in their design so the objects may hide in plain sight. Unearthing those gems and aligning them to the ODS is the most important step towards true legacy modernization. The most simple solution is actually the most practical one: load an object with the existing software and persist it into a MongoDB collection. The effort of persisting the object results in two lines of code that can be easily added. The location of the two lines of code (first line one opens connection to database; second line one persists the object) does not matter as long as it is in a place after the object is built out. This is the first time you will see the beauty of MongoDB and MQL at work. You really have to do nothing for the object itself—e.g. no decomposition or abstraction layer. MongoDB takes care of it for you. When looking at the object in the MongoDB database, e.g. using MongoDB Compass, you will realize that it already looks a lot like the domain object you wanted. The actual task to map objects to domains, or subset of domains, is now mostly driven by the application use case. Tip: How to leverage application mapping to accelerate onboarding In the model below, which was taken from the financial industry but can easily be adopted across industries, we identify the data domains in various applications and map their behavior to the effort it takes to locate them as well as their importance to the app. First, each domain gets a rating for its object complexity, where “complexity” is defined by the implementation team. This is similar to the concept of “ poker ” in a development sprint. Second, each data domain must be located in the application content. Then, it’s tally time. As we can see in the example above, the concept of schedules looks quite easy but is superseded by the client profiles which have a touch more application context (spoiler: those always come out on top). Based on the combination of complexity and the number of data domains affecting an application, we can now easily achieve the model below. Agile is your friend and, assuming a certain “point capacity,” the applications fall into place for their conversion schedule in a quite neutral fashion. The development team will then start with low hanging fruit. As soon as application 1, 6 and 7 are ported, we’re in business in a new modern landscape. Along the journey, the domains will get cleaned up naturally as we do not have the static corsage of the RDBMS table designs. Step 2: Scaffolding Scaffolding is the art of building a bridge that can hold people as they cross it, then immediately dissipate once they step off. But for that critical time, it needs to hold. The same is true for the connectivity between a legacy system and a new data platform. Starting with the first sprint, we have data residing in the MongoDB data platform. If the data is limited to new applications and resides exclusively in MongoDB, nothing needs to be done. However, as shown in the client profiles example above, there may be dependencies to consider. The synchronization between the legacy database and the new MongoDB platform can be easily arranged using microservices and the same concepts used for the initial loading of data. Synchronization can also be achieved through “the gate” if only READ data is needed during the first sprint, or if you’re already dealing with WRITE and the requirement to synchronize those writes back to a legacy system. Streaming: A streaming based solution is a great option for uni-directional operations that allow read only in the most simple way. Service: Selecting a simple, tiny microservice is a good option for the use case where data needs to be selectively written. It works using the document model on the MongoDB side, but can still push necessary updates back to the legacy system, and vice-versa. The great news is that this service potentially exists already, as it requires nothing more than using the old database interface from the legacy application on one side and the new, easy-to-digest JSON document format on the MongoDB side. If both databases are ACID-compliant, any transaction is automatically treated as a normal application interaction on both sides. “Y-Loader”: Another option is a true “Y-loader,” where all transactions are written in sync to both databases in parallel, and the actual transaction is only considered committed when both systems report their commit and completion. Simple two-phase protocols (write to both, wait five seconds, read both to validate and, if in sync, commit to application) are available as ready-made services through various distributed transaction coordinators, but often it’s easier to use the existing data access in the application. In that case, the new data path to MongoDB is in parallel, and a simple redundant checkpoint (which the application logic would have had for the legacy path anyway) is expanded for this purpose. Step 3: Coding The coding with the new domain data model, as well as the MongoDB flexible document model as the underlying base, will immediately impact the coding for the business logic and application development. The operative word is immediately. As the data gets unlocked with the initial persistence of the code object to the MongoDB collection, the developer is simultaneously able to code based on business requirements. Developers will no longer be hindered by reference and requirements of object mappers. As the objects are represented through the MongoDB idiomatic drivers, each programming object resides directly in the data collection; in reverse, any changes to the business logic object will be naturally represented—code-free—in the MongoDB collection. A single blog post can't resolve all open questions and edge cases. Each application, client, and data interface is unique. Databases possess historic technical debt and implicit assumptions that become lost in generations of developers over time. “Do not touch this section—not sure what it does but last time we tried all hell broke loose…” is often-heard advice around the organizational water cooler. But the key lesson? There are many different templates available and very simple methods of quickly taking the lead to significant success. For example, a German client, who was stuck in a combination of IBM DB2 (mainframe and distributed) with a significant Hadoop footprint, was amazed when they realized they could “lift” their data one microservice at a time. This resulted in business requirements shifting from “impossible to do” for some requested queries to “completed in under one second” within a single week of a proof-of-concept. This is no exception. Cases and changes like these are made daily, reinforcing Mark Twain’s sage advice that “The secret of getting ahead is getting started." Step 4: Innovation As the migration from the legacy environment continues, innovation will be the new focus. The unlocking of previously siloed data allows immediate coupling of real-time data with machine learning platforms for various purposes: e.g. scoring for financial decision-making, personalization for retail, or optimization of production processes in the IOT context. New applications and solutions can easily be created on top of the unleashed data, even with various programming languages, direct real-time dashboards created with MongoDB Charts, and different paradigms (again, MongoDB’s idiomatic drivers do magic!) At this time, the discussion with the product owners in your squads and tribes (trying to be real modern here) begins with the question“What is the highest priority component to change?” and “What function is required to enable this change?” Is it worth waiting much longer? The real question is: why did we all not start sooner? It’s time to begin integrating the list of features you always dreamed of having, but never dared to pursue. The MongoDB team is here to help you get started. Reach out today and let’s discuss the best path forward. To learn more about modernizing to MongoDB, click here .

January 27, 2021

What’s new in MongoDB for VS Code

A few months ago, we introduced MongoDB for VS Code , an extension to quickly connect to MongoDB and Atlas and work with your data right inside your code editor. Since then, over 85,000 of you have installed the extension, and based on your feedback, we improved the extension quite a bit and released a few new versions that added new functionality and extended the existing one. With this week’s release, we close the loop on what you can do with MongoDB for VS Code: You can choose the database and collection you want to query Make sure that you have the right indexes in place (an create new indexes if you don't) Search for documents with playgrounds Update your documents directly in the editor All of this with a workflow that is well integrated with the native VS Code functionality and its shortcut. Index View When you work with your data in MongoDB, no matter if you are building a cool application or if you are writing a Playground for an analytics query, you want to make sure your queries are covered by the right indexes. In MongoDB for VS Code that’s extremely easy to do: just select the collection you want in the tree view and all the information you need is right there. And if you see that the index you need is missing, we can prefill a playground with the command you need to create it. Quick Access to Document Search Once the right indexes are in place, jumping into a prefilled playground to find documents is just one click away. From there, you can just customize your query to find the documents you need. Results of a query – or of any playground, really – can also be saved into a file for later use or to share it with your colleagues. Just hit Ctrl/Cmd+S like you’d normally do with any file in VS Code and you are done. Edit Documents After you’ve found the documents you were looking for, you can open each one in its own editor view, edit it and save it back into MongoDB. Document editing was our most requested feature, and we are happy to have finally shipped it in our most recent release of MongoDB for VS Code and to have it built it in a way that fits within the natural VS Code user experience. VS Code Playgrounds + Node.js API + NPM Modules What I described above, is a normal flow when you work with a database: pick your db and collection, find the documents you are interested in, edit them and finally save the changes back to the database. MongoDB playgrounds are much more powerful than that though. First of all, you can use them to run any command that you’d run in the MongoDB shell: this means that playgrounds are effectively a shell replacement and a great way to write, edit and save long shell scripts in a full-featured editor. Second, in playgrounds, you have the entire Node.js API available to you and with a bit more work you can even require any module from NPM inside your playground code . Here’s how it’s done. Go to VS Code’s local settings folder ($HOME/.vscode on Linux/macOS or %USERPROFILE%/.vscode on Windows) and install a module from NPM. We’ll use cowsay as an example: $ npm i cowsay Now, go back to VS Code, connect to a MongoDB server or to an Atlas cluster and create the following playground: const cowsay = require('cowsay'); cowsay.say({ text : db.version() }); If everything works as expected, a cute cow should tell you what version of the server you are currently using. As you can see, the result of a playground does not have to be JSON documents. It can really be anything, and you can use any node modules out there to format it in the way you need it. This coupled with other VS Code extensions (one that I like a lot is Charts.js Preview for example) can make VS Code a powerful tool to query and analyze your data stored in MongoDB. Try it Now! If you are a VS Code user, getting started with MongoDB for VS Code is easy: Install the extension from the marketplace Get a free Atlas cluster if you don’t have a MongoDB server already Connect to it and start doing cool stuff with playgrounds You can find more information about MongoDB for VS Code and all its features in the documentation . Is there anything else you’d like to see in MongoDB for VS Code? Join in the discussion at the MongoDB Community Forums , and share your ideas using the MongoDB Feedback Engine .

January 26, 2021

Add Interactivity to Your Embedded Analytics with Click Events

MongoDB Charts’ data visualizations can now become more interactive, so users and stakeholders can dive deeper into the insights they care more about. That’s possible with a new feature currently in beta with support for most Charts types: click events. A click event in the Charts embedding SDK is simply a notification that a user clicked on a chart. That click could be anything: They might have clicked on a bar in a bar chart, a chart’s legend, or even empty white space within the chart. Web developers can subscribe to these events and create different functionality depending on where the user clicked. Why might you want to enhance your app or embedded analytics workflow with click event data? Click-event data opens up a wide range of possibilities. Here are a couple of examples, inspired by various Charts users who’ve been telling us how they’d like to use click-event data. Open up another chart, based on a user clicking on data within a chart: A logistics company has a bar chart that shows pending orders at an aggregate level per region, and they want to see more detail on pending orders for a specific region. They create a click event handler in their application that opens up a new chart with pending orders per supplier, based on the region selected in the aggregate chart. Filtering the other charts on a dashboard when a series or data point on a single chart is clicked: A retail clothing company has a dashboard with various shopping cart information such as sales, orders processed, and returns, for their portfolio of products. The head of outerwear sales only wants to see data for the “outerwear” category of products, so they click on the “outerwear” series within a bar chart. The rest of the dashboard can adapt so that it shows only information relevant to outerwear. The example below is created from one of our sample data sets. We created two charts in a single app, tied to the sample movie data set that every Atlas user can access. On the left is a stacked bar chart with category level data that includes genre and decade. On the right is a table chart that shows each individual movie within a category. Clicking on a specific category in the bar chart updates the movies shown in the table chart. How can you get started with click events of embedded charts? If you haven’t yet used the embedding SDK for MongoDB Charts, you’ll want to familiarize yourself with the docs , consider watching this video tutorial , and access the SDK via the Charts GitHub repository . Regardless if you’re new to using the SDK or have experience with it, it’s important to note that you will need to use the @mongodb-js/charts-embed-dom@beta tagged version of the SDK to have access to the click events functionality while it’s in beta. There are two examples specifically for click events in the repository: click-events-basic and click-events-filtering . If you just want to explore and test out the functionality, you can play around with them in the following sandboxes using Click events basic sandbox Click events filtering sandbox Here’s a snapshot of the data surfaced in a click event that is available for developers to use in their apps. In this example I clicked on the yellow section of the top bar in the Movie Genres chart above. Note how it includes details of the click coordinates, the type and role of the clicked mark, and the corresponding chart data for the mark. { "chartId": "90a8fe84-dd27-4d53-a3fc-0e40392685dd", "event": { "type": "click", "altKey": false, "ctrlKey": false, "shiftKey": false, "metaKey": false, "offsetX": 383, "offsetY": 15, "clientX": 403, "clientY": 99, "pageX": 403, "pageY": 99, "screenX": 756, "screenY": 217 }, "data": { "y": { "label": "Genre", "value": "Drama" }, "x": { "label": "# Movies", "value": 3255 }, "color": { "label": "Decade", "value": "2010 - 2020", "lowerBound": 2010, "upperBound": 2020 } }, "target": { "type": "rect", "role": "mark", "fill": "#F0D175" }, "apiVersion": 1 } Whether you’re an avid user or new to MongoDB Charts, we hope you consider taking advantage of the new click event capability to increase the interactivity of Charts. It’s in beta because there is more functionality still to come. It has yet to be released for a few chart types: geospatial, table, top item, word cloud, and number charts. On that note, we’d love to hear your thoughts through the MongoDB Feedback Engine . If you haven’t tried Charts yet, you can get started for free by signing up for a MongoDB Atlas and deploying a free tier cluster.

January 20, 2021

Modernize data between siloed data warehouses with Infosys Data Mesh and MongoDB

The Data Challenge in Digital Transformation Enterprises that embark on a Digital transformation often face significant challenges with accessing data in a timely manner—an issue that can quickly impede customer satisfaction. To deliver the best digital experience for customers, companies must create the right engagement strategy. This requires all relevant data available in the enterprise be readily accessible. For example, when a customer contacts an insurance company, it is important that the company has a comprehensive understanding of the customer’s background as well as any prior interactions, so they can orchestrate the best possible experience. Data is available in both BI (Business Intelligence) systems, like Enterprise Data Warehouses, and OI (Operational Intelligence) systems, like policy and claim systems. There is a need to bring these BI and OI systems together to avoid any disruption to the digital functions that may delay synchronization. Data removed from an operational system loses context. Re-establishing this domain context and providing persona-based access to the data requires domain-oriented, decentralized data ownership, as well as architecture. Ultimately, organizations seek to use data as a key to fueling the products and services they provide their customers. This data should minimize the cost of customer research—but the data needs to be trusted and high quality. Companies need access to these siloed sources of data in a seamless self-service approach across various product life cycles. The Challenge of Centralized Data Historically, businesses have handled large amounts of data from various sources by ingesting it all into a centralized database (data warehouse, data lake, or data lake on cloud). They would then feed insight drivers, like reporting tools and dashboards as well as online transaction processing applications, from that central repository. The challenge with this approach is the broken link between analytical systems and transactional systems that impedes the digital experience. Centralized systems, like data warehouses, introduce latency and fail to meet the real time response and performance levels needed to build next-generation digital experiences. What is Infosys Data Mesh? Data Mesh helps organizations bridge the chasm between analytics and application development teams within large enterprises. Data Mesh is an architecture pattern that takes a new approach to domain-driven distributed architecture and the decentralization of data. Its basic philosophy is to encapsulate the data, its relationships, context, and access functionality into a data product with guaranteed quality, trust, and ease of use for business consumption. Data Mesh is best suited for low-latency access to data assets used in digital transformations that are intended to improve experience through rich insights. With its richer domain flavor — distributed ownership, manageability, and low latency access — Data Mesh is best positioned as a bridge between transactional (consuming applications) and analytical systems. This diagram depicts the high-level solution view of Data Mesh: Data Mesh Key Design Principles Domain-first approach. Data as a product. Data Mesh products share the following attributes which maximize usability and minimize friction: Self-described: metadata is precise and accurate Discoverable and addressable: products are uniquely identifiable and easy to find Secure and well-governed: only those who are granted access have it Trustworthy: proper data quality conrtols are applie, SLA/SLOs are maintainted Open standard and interoperable: data formats &#8212; XBRL, JSON Build new products easily. Any cross-functional team can build a new, enterprise-level product in an existing domain and/or fro existing products Simplified access for multiple technology stacks. Polygot data and ports, cloud and non-cloud. Common infrastructure and services for all data pipelines and catalogs. Platform Requirements for a Data Mesh To build a data mesh, companies need a database platform that can create domain-driven data products that meet various enterprise needs. This includes: Flexible data structures — to accomodate new behaviors An API-driven construct — to access current data products and build new domain-data ones Support for high-performance query on large-scale data structures A shared, scalable infrastructure Why MongoDB is the Optimal Platform for Infosys Data Mesh MongoDB is the best platform for realizing the Infosys Data Mesh architecture and powering analytics-driven enterprises because it provides: A flexible document model and a poly-cloud infrastructure availability so teams can easily modify and enrich flat or hierarchical data models MongoDB Realm Webhooks to create service APIs which connect data across products and enable consumption needs based on business context A scalable, shared infrastructure and support for high-performance querying of large scale data Service APIs for constructing Infosys Data Mesh Two use cases: table, th, td { border: 1px solid black; border-collapse: collapse; } Case 1: Case 2: A wealth management firm offers a variety of products to its customers — things like checking and savings accounts, trading, credit and debit cards, insurance, and investment vehicles. Challenges: Each product is serviced by a different system and technology infrastructure Internal consumers of this data have different needs: product managers analyze product performance, wealth managers and financial advisors rely on customer-centric analytics, and financial control teams track the firm’s revenue performance Solution: Using the Infosys Data Mesh model, the firm’s data owners create domain-data products categorized by customer and product, and then curate and publish them through a technology-agnostic, API-driven service layer. Consumers can then use this service layer to build the data products they need to carry out their business functions. The Risk and Finance unit of a large global bank has multiple regional data lakes catering to each region’s management information system and analytical needs. This poses multiple challenges for creating global data products: Challeges: Technology varies across regions ETL can becomes less advantageous depending on circumstance Regulations govern cross-regional data transfer policies Solution: To address these challenges, the bank creates an architecture of regional data hubs for region-specific products and, as with Case 1, makes those products available to authorized consumers through a technology-agnostic, API-driven service layer. Next, it implements an enterprise data catalog with an easy-to-use search interface on top of the API layer. The catalog’s query engine executes cross-hub queries, creating a self-service model for users to seamlessly discover and consume data products and to align newer ones with their evolving business needs. Enterprise security platform integration ensures that all regulatory and compliance requirements are fully met. How Businesses Overall Can Benefit Data and Insights become pervasive and consumable across applications and personas Speed-to-insights (including real time) enable newer digital experiences and better engagement leading to superior business results Self-service through trusted data products is enabled Infosysy DNA Assets on MongoDB Accelerates the Creation of Industry-Specific Domain Data Products table, th, td { border: 1px solid black; border-collapse: collapse; } Infosys Genome Infosys Data Prep Infosys Marketplace Creates the foundation for Data Mesh by unifying semantics across industries Guides consumers through the product creation process with a scalable data preparation framework Enables discovery and consumption of domain-data products via an enterprise data catalog Download our Modernization Guide for information about which applications are best suited for modernization and tools to get started.

January 14, 2021

Legacy Modernization with MongoDB and Confluent

In many organizations, crucial enterprise data is locked in dozens or hundreds of silos that may be, controlled by different teams, and stuck in systems that aren’t able to serve new workloads or access patterns. This is a blocker for innovation and insight ultimately hampering the business. For example, imagine building a new mobile app for your customers that enables them to view their account data in a single view. Designing the app could require months of time to simply navigate the internal processes necessary to gain access to the legacy systems and even more time to figure out how to integrate them. An Operational Data Layer, or ODL, can offer a “best of both worlds” approach, providing the benefits of modernization without the risk of a full rip and replace. Legacy systems are left intact – at least at first – meaning that existing applications can continue to work as usual without interruption. New or improved data consumers will access the ODL rather than the legacy data stores, protecting those stores from new workloads that may strain their capacity and expose single points of failure. At the same time, building an ODL offers a chance to redesign the application’s data model, allowing for new development and features that aren’t possible with the rigid tabular structure of existing relational systems. With an ODL, it’s possible to combine data from multiple legacy sources into a single repository where new applications, such as a customer single view or artificial intelligence processes, can access the entire corpus of data. Existing workloads can gradually shift to the ODL, delivering value at each step. Eventually, the ODL can be promoted to a system of record and legacy systems can be decommissioned. Read our blog covering DaaS with MongoDB and Confluent to learn more. There’s also a push today for applications and databases to be entirely cloud-based, but the reality is that current business applications are often too complex to be migrated easily or completely. Instead, many businesses are opting to move application data between on-premises and cloud deployments in an effort to leverage the full advantage of public cloud computing without having to undertake a complete, massive data lift-and-shift. Confluent can be used for both one-time and real-time data synchronization between legacy data sources and modern data platforms like MongoDB, whose fully managed global cloud database service, MongoDB Atlas , is supported across AWS, Google Cloud, and Azure. Confluent Platform can be self-managed in your own data center while Confluent Cloud can be used on the public clouds. Whether leaving your application on-premise is a personal choice or a corporate mandate, there are many good reasons to integrate with MongoDB Atlas. Bring your data closer to your users in more than 70 regions with Atlas’s global clusters Address your most intense workloads with one-click, automated sharding for scale out and zero-downtime scale up Quickly provision TBs of database storage, all on high performance SSDs with dedicated I/O bandwidth Natively query and analyze data across AWS S3 and MongoDB Atlas with MongoDB Atlas Data Lake Perform full-text search queries with MongoDB Atlas Search Build native mobile applications that seamlessly synchronize data with MongoDB Realm Create powerful visualizations and dashboards of your MongoDB data with MongoDB Charts Off-load older data to cost effective storage with MongoDB Atlas Online Archive In this video we will show one time migration and Real time continuous data synchronization from a Relational System to MongoDB Atlas using Confluent Platform and the MongoDB Connector for Apache Kafka . Also we will be talking about different ways to store and consume the data within MongoDB Atlas. Git repository for the demo is here . Learn more about the MongoDB and Confluent partnership here and download the joint Reference Architecture here . Click here to learn more about modernizing to MongoDB.

January 7, 2021

Finding Inspiration and Motivation at MongoDB University

For many people, across the globe, 2020 was a strange and challenging year. The new year has brought the hope of healthier and more prosperous times ahead, but inspiration to stay positive can still be tough to find. For MongoDB Certified Developer Kirk-Patrick Brown, the past months presented obstacles, but with perseverance he also experienced growth and even found ways to give back to his local community using what he learned at MongoDB University . Kirk-Patrick sat down with us virtually, from his home in Jamaica, to talk about his passion for MongoDB, getting certified through MongoDB University in the middle of the pandemic, and staying motivated. Can you tell us about yourself and your approach to software development? I’m Kirk-Patrick Brown, a senior software developer at Smart Mobile Solutions Jamaica. I consider myself an artist. I have a history in martial arts and poetry. I medaled in the Jamaica Taekwondo Championships and received the certificate of merit in a creative writing competition hosted by the Jamaica Cultural Development Commission. It was only natural to bring those artistic traits when moving into software development. For me, software development is also an artistic pursuit. It gives me a canvas to create and bring ideas to life, which in turn brings business value. When did you begin building with MongoDB? I had my first hands on-experience with MongoDB in 2018. I realized it was one of those rare gems that you see, and you're immediately curious about how it actually works, because it’s not like what you’re used to. In Jamaica there are a lot of organizations that run on some form of relational database. But once I learned about MongoDB and NoSQL I became a self-motivated evangelist for MongoDB. I understand that organizations may have used relational databases in the past, and that is understandable because there is a longer history and at one time that was the main type of database for your typical workload, but things have changed drastically. In this era there is more demand for data and all different types of unstructured data. With the advent of big data, systems that were designed years ago may not be able to provide optimal storage and performance. MongoDB is a better alternative for such use cases and enables built-in features such as auto-sharding to horizontally scale and aid in the efficient storage and retrieval of big data. MongoDB keeps being so innovative. The other day I was preparing for a multicloud accreditation with Aviatrix, and it was so funny--at the very same time, MongoDB came out with multicloud clusters. It was just beautiful. You don’t want to get locked into one cloud provider for your deployments. Even though these cloud providers offer availability zones for increased fault tolerance, things can still happen. Becoming multi-cloud allows you to become more resilient to disaster. Being in multiple clouds also lets you bring some of your replica sets closer geographically to your customers. By leveraging regional presences across multiple clouds, you can reduce in-network latency, and increase your ability to fulfill queries faster. That’s one of the main features of MongoDB replication--the ability to configure a member to be of higher priority than others, which could be driven by the location in which most of your queries originate. Multi-cloud clusters enable high availability and performance, and I think it was amazing of MongoDB to create such a feature. You call yourself a “self motivated evangelist” for MongoDB. We’re flattered! What has your experience been? I’m actively trying to get organizations to appreciate NoSQL. Recently I presented to a group of developers in the agile space. I spoke to them about replication, sharding, indexes, performance, and how MongoDB ties into advanced features of security in terms of authentication. I’m primarily pushing for developers and organizations to appreciate the Atlas offering from MongoDB. Right out of the box you can instantly have a deployed database out there in Atlas--with the click of a button, pretty much. You can get up and running immediately because MongoDB is a cloud-first database. Plus there's always customer support, even at the free tiers. You don’t feel alone with your database when you’re using MongoDB Atlas. There has been some resistance, because NoSQL requires a bit of a mental shift to understand what it can provide. But we live in a world where things continually change. If you are not open to adapting I don’t even have to say what’s going to happen, you know? You became MongoDB Certified through MongoDB University in the middle of the pandemic. Can you tell us about that experience? Even before the pandemic started I was studying courses at MongoDB University, and traveling 100 kilometers to go to work every week, while also caring for my family and three year-old son back at home. There were some delays, but I was able to become MongoDB-certified in July 2020. Becoming MongoDB-certified has impacted me in positive ways. I’ve met people I did not know before. It has also given me a level of confidence as it relates to building a database that is highly available, scalable, and provides good data reads via the different types of indexes and indexing techniques provided by MongoDB. I can create the database, perform and optimize CRUD operations, apply security and performance activities alongside a highly available and scalable cluster, all thanks to the knowledge provided by MongoDB University. The courses at MongoDB University covered those aspects very well. There is enough theory but also a great amount of practical application in the courses, so you leave with working knowledge that you can immediately use. What is the project you worked on during the pandemic that you’re most proud of? One of the things I’ve worked on intensely during the pandemic is helping to develop a video verification application for a local company and building out most of the backend functionality. For that project, there was a great deal of research needed into the technological tools and implementation to support recording verification videos of customers. I felt like it was my contribution to society at a time when it was dangerous for people to come into that local business. If I can develop something that allows even one person not to need to come into that physical location, that could be the difference between someone contracting the virus or not. A virus that has taken many lives and disrupted a lot of families this year. What advice do you have for other developers who are struggling right now with motivation to advance themselves and their careers? Don’t ever give up. In anything that you do. There is nothing that you’ll do that’s going to be both good and easy. Being a developer, you experience different problems that you have to solve but you have to keep moving forward. I don’t believe in failure, because in anything you do, there is always a win. You have your experiences and those experiences can guide your decision making. It’s just like machine learning. Machines need a lot of data and you can’t give the machine all positive data. It needs some negative data for it to become a good training model. You need bad experiences as well as good ones. If we had all good experiences our brains would not have the training models to make those correct decisions when we need them. Each day I make one definite step or positive decision. And that may be as simple as going onto the MongoDB University site and saying “I’m going to complete this one course.” You just have to keep going at it. You plan for a lot of things in life, but things most of the time don’t happen when you want them to. There's going to be some delay or something. But you can’t give up. Because if you give up then everything is lost. As long as there is time and there is life then there is opportunity to keep doing this thing. And it may take a little bit to get there but eventually you will. But if you give up, you definitely won’t!

January 6, 2021

Run Secure Containerized MongoDB Deployments Using the MongoDB Community Kubernetes Operator

First introduced earlier this year, the MongoDB Community Kubernetes Operator now allows you to run secure MongoDB deployments in your Kubernetes cluster. The Community Operator is open source, and ideally suited for experimentation, testing, and lightweight production use cases. For larger or more mission-critical workloads with requirements around monitoring, alerting, and data recovery, we recommend the MongoDB Enterprise Kubernetes Operator, available with MongoDB Enterprise Advanced. This blog tutorial will show you how to deploy and configure a fully secure MongoDB deployment inside Kubernetes from scratch, using the MongoDB Community Kubernetes Operator and cert-manager. The Community Operator is available here . Installation The MongoDB Community Kubernetes Operator allows you to deploy secure MongoDB Replica Sets in your Kubernetes cluster. Before we can deploy MongoDB, we need to ensure that we have created the required CustomResourceDefinition. Note: This operation requires cluster admin permissions. kubectl apply -f Create a namespace for our deployment. kubectl create namespace mongodb Install the latest version of the operator. kubectl apply -f *Note: If using OpenShift, make sure to reference the OpenShift samples instead. Deploying a SCRAM Enabled Replica Set The Community Operator creates secure SCRAM-SHA-256 enabled deployments by default. This means that we need to define our user and what roles we want them to have, alongside a set of credentials for the user to use. We can create the user's credentials in the form of a Kubernetes Secret kubectl create secret generic my-mongodb-user-password -n mongodb --from-literal="password=TXs3ZsuIqT-pQFvwxOec" Once we have created the secret, we can deploy a MongoDB replica set. --- apiVersion: kind: MongoDB metadata: name: mongodb-replica-set namespace: mongodb spec: members: 3 type: ReplicaSet version: "4.4.0" security: authentication: modes: ["SCRAM"] users: - name: my-mongodb-user db: admin passwordSecretRef: name: my-mongodb-user-password # the name of the secret we created roles: # the roles that we want to the user to have - name: readWrite db: myDb scramCredentialsSecretName: mongodb-replica-set Note: If your application is in the same namespace, it can use this secret to connect to the MongoDB instance. If your application gets the credentials some other way, this secret can be deleted. If you want to change this user's password in the future, you can simply create a new secret with the same name, or reference a different secret in the resource definition. Once the MongoDB resource has been created, the operator will create and configure a StatefulSet for this replica set. You'll notice that each pod consists of 2 containers, the mongod itself, and the mongodb-agent which runs in a sidecar and handles automation of the mongod processes. Once the MongoDB resource has been created, we can wait for the replica set to get into the “Running” state. NAME READY STATUS RESTARTS AGE mongodb-kubernetes-operator-5d757df5c8-d6ll7 1/1 Running 0 5m21s mongodb-replica-set-0 2/2 Running 0 2m56s mongodb-replica-set-1 2/2 Running 0 2m10s mongodb-replica-set-2 2/2 Running 0 72s Connecting to the Replica Set Once the resource has been successfully created, we can connect and authenticate to the MongoDB replica set as the user we defined in the resource specification. Now you can connect to the replica set from your application using the following connection string: USERNAME="my-mongodb-user" PASSWORD="$(kubectl get secret my-mongodb-user-password -o jsonpath='{.data.password}' | base64 -d)" CONNECTION_STRING="mongodb://${USERNAME}:${PASSWORD}@mongodb-replica-set-0.mongodb-replica-set-svc.mongodb.svc.cluster.local:27017,mongodb-replica-set-1.mongodb-replica-set-svc.mongodb.svc.cluster.local:27017,mongodb-replica-set-2.mongodb-replica-set-svc.mongodb.svc.cluster.local:27017" We can also connect directly through the mongo shell. MONGO_URI="$(kubectl get mdb mongodb-replica-set -o jsonpath='{.status.mongoUri}')" kubectl exec -it mongodb-replica-set-0 -c mongod -- mongo ${MONGO_URI} --username "${USERNAME}" --password "${PASSWORD}" Note: As our user only has access to the "myDb" database, we only have permissions to read and write to this database. use myDb db.col.insert({ "hello": "world" }) Configure TLS with Jetstack's Cert Manager Cert-manager is a Kubernetes add-on from Jetstack which automates the management and issuing of TLS certificates. The Community Operator is fully compatible with the cert-manager certificate format. Install cert-manager into your cluster: Generate a certificate authority that will issue the certificates for our replica set. openssl genrsa -out ca.key 2048 Generate a CA certificate, or use your own. Note: If using your own CA certificate, you'll need to make sure a few requirements are met. It has a filename of "ca.crt" The common name either matches the domain name of all replica set members or has the domain name of all replica set members. View the docs for more details. When generating the CA certificate, we'll use a wildcard Common Name that matches the domain name of all of the replica set members. COMMON_NAME="*.mongodb-replica-set-svc.mongodb.svc.cluster.local" openssl req -x509 -new -nodes -key ca.key -subj "/CN=${COMMON_NAME}" -days 3650 -reqexts v3_req -extensions v3_ca -out ca.crt Create a kubernetes configmap containing the CA. kubectl create configmap ca-config-map --from-file=ca.crt --namespace mongodb Create a kubernetes secret containing the signing pair in the mongodb namespace. kubectl create secret tls ca-key-pair --cert=ca.crt --key=ca.key --namespace=mongodb Once we have created our key pair, we can create a cert-manager issuer resource from our key pair which will issue the certificates for our MongoDB deployment. Create the following cert-manager issuer custom resource definition and save it as cert-manager-issuer.yaml. --- apiVersion: kind: Issuer metadata: name: ca-issuer namespace: mongodb spec: ca: secretName: ca-key-pair Next, apply the resource definition. kubectl apply -f cert-manager-issuer.yaml Create a cert-manager certificate resource definition which references the newly created issuer into cert-manager-certificate.yaml. --- apiVersion: kind: Certificate metadata: name: cert-manager-certificate namespace: mongodb spec: secretName: mongodb-tls issuerRef: name: ca-issuer kind: Issuer commonName: "*.mongodb-replica-set-svc.mongodb.svc.cluster.local" organization: - MongoDB Apply the certificate resource. kubectl apply -f cert-manager-certificate.yaml Shortly after we create the Certificate resource, we should see the "mongodb-tls" secret which was created by cert-manager. kubectl get secret mongodb-tls NAME TYPE DATA AGE mongodb-tls 3 1m Without making any modifications to the secret, we can update our MongoDB resource to configure TLS. We just need to reference both the configmap containing the ca, and the secret that cert-manager generated containing the certificates for our deployment. --- apiVersion: kind: MongoDB metadata: name: mongodb-replica-set namespace: mongodb spec: members: 3 type: ReplicaSet version: 4.4.0 security: tls: enabled: true certificateKeySecretRef: name: mongodb-tls caConfigMapRef: name: ca-config-map authentication: modes: - SCRAM users: - name: my-mongodb-user db: admin passwordSecretRef: name: my-user-password roles: - name: readWrite db: myDb scramCredentialsSecretName: mongodb-replica-set NAME READY STATUS RESTARTS AGE mongodb-kubernetes-operator-5d757df5c8-d6ll7 1/1 Running 0 3h16m mongodb-replica-set-0 2/2 Running 1 13m mongodb-replica-set-1 2/2 Running 1 13m mongodb-replica-set-2 2/2 Running 1 12m Shortly after the updated MongoDB resource was applied, we should see the members of the replica set back in the “Running” state. Each member will have been restarted once as changing TLS configuration results in a rolling restart of the deployment. Once the changes have all been applied, we can test our connection over TLS by connecting to any of the mongod containers: kubectl exec -it mongodb-replica-set-0 -c mongod -- bash And using the mongo shell to connect using TLS: mongo --tls --tlsCAFile /var/lib/tls/ca/ca.crt --tlsCertificateKeyFile /var/lib/tls/server/*.pem --host mongodb-replica-set-0.mongodb-replica-set-svc.mongodb.svc.cluster.local In this blog tutorial we: Deployed the MongoDB Community Kubernetes Operator into our Kubernetes cluster Created a secure, SCRAM-SHA enabled MongoDB resource and our password in the form of a Kubernetes secret Used cert-manager to create TLS certificates for our MongoDB deployment And finally, configured our MongoDB resource to enable TLS for our deployment To get started yourself, download MongoDB Community Kubernetes Operator here .

December 14, 2020

Part 1: The Modernization Journey with Exafluence and MongoDB

Welcome to the first in a series of conversations between Exafluence and MongoDB about how our partnership can use open source tools and the application of data, artificial intelligence/machine learning and neuro-linguistic programming to power your business’s digital transformation. In this installment, MongoDB Senior Partner Solutions Architect Paresh Saraf and Director for WW Partner Presales Prasad Pillalamarri sit down with Exafluence CEO Ravikiran Dharmavaram and exf Insights Co-Founder Richard Robins to discuss how to start the journey to build resilient, agile, and quick-to-market applications. &nbsp; From Prasad Pillalamari: I first met Richard Robins, MD & Co-Founder of exf Insights at Exafluence back in June 2016 at a MongoDB world event. Their approach towards building data-driven applications was fascinating for me. Since then Exafluence has grown by leaps and bounds in the System Integration space and MongoDB has outperformed its peers in the database market. So Paresh and I decided to interview Richard to deep-dive into their perspective on Modernization with MongoDB. Prasad & Paresh: We first met the Exafluence team in 2016. Since then, MongoDB has created the Atlas cloud data platform that now supports multi-cloud clusters and Exafluence has executed multiple projects on mainframe and legacy modernization. Could you share your perspective on the growth aspects and synergies of both companies from a modernization point of view? Richard Robins: Paresh and Prasad, I’m delighted to share our views with you. We’ve always focused on what happens after you successfully offload read traffic from mainframes and legacy RDBMS to the cloud. That’s digital transformation and legacy app modernization. Early on, Exafluence made a bet that if the development community embraces something we should, too. That’s how we locked in on MongoDB when we formed our company. Having earned our stripes in the legacy data world, we knew that getting clients to MongoDB would mean mining the often poorly documented IP contained in the legacy code. That code is often where long-retired subject matter expert (SME) knowledge resides. To capture it, we built tools to scan COBOL/DB2 and stored procedures to reverse engineer the current state. This helps us move clients to a modern cloud native application, and it's an effective way to merge, migrate, and retire the legacy data stores all of our clients contend with. Once we’d mined the IP with those tools we needed to provide forward-engineered transformation rules to reach the new MongoDB Atlas endpoint. Using a metadata driven approach, we built a rules catalog that included a full audit and REST API to keep data governance programs and catalogs up to date as an additional benefit of our modernization efforts. We’ve curated these tools as exf Insights , and we bring them to each modernization project. Essentially, we applied NLP, ML, and AI to data transformation to improve modernization analysts’ efficiency, and added a low-to-no code transformation rule builder, complete with version control and rollback capabilities. All this has resulted in our clients getting world-class, resilient capabilities at a lower cost in less time. We’re delighted to say that our modernization projects have been successful by following simple tenets — to embrace what the development community embraces and to offer as much help as possible — embodied in the accelerator tools we’ve built. That’s why we are so confident we'll continue our rapid growth. P&P: How do you think re-architecting legacy applications with MongoDB as the core data layer will add value to your business? RR: We believe that MongoDB Atlas will continue to be the developers go-to document database, and that we’ll see our business grow 200-300% over the next three years. With MongoDB Atlas and Realm we can provide clients with resilient, agile applications that scale, are easily upgraded, and are able to run on any cloud as well as the popular mobile iOS and Android devices. Digital transformation is key to remaining competitive and being agile going forward. With MongoDB Atlas, we can give our clients the same capabilities we all take for granted on our mobile apps: they’re resilient, easy to upgrade, usually real-time, scale via Kubernetes clusters, and can be rolled back quickly if necessary. Most importantly, they save our clients money and can be automatically deployed. P&P: At a high level, how will Exafluence help customers take this journey? RR: We’re unusual as a services firm in that we spend 20% of gross revenue on R&D, so our platform and approach are proven. Thus, relatively small teams for our healthcare, financial services, and industrial 4.0 clients can leverage our approach, platform, and tools to deliver advanced analytical systems that combine structured and unstructured data across multiple domains. We built our exf Insights accelerator platform using MongoDB and designed it for interoperability, too. On projects we often encounter legacy ETL and messaging tools. To show how easy it is, we recently integrated exf Insights with SAP HANA and the SAP Data Intelligence platform. Further, we can publish JSON code blocks and provide Python code for integration into ETL platforms like Informatica and Talend. Our approach is to reverse engineer by mining IP from legacy data estates and then forward engineer the target data estate, using these steps and tools: Reverse Engineer Extract stored procedures, business logic, and technical data from the legacy estate and load it into our platform. Use our AI/ML/NLP algorithms to analyse business transformation logic and metadata, with outliers identified for cleansing. Provide DB scans to assess legacy data quality to cleanse and correct outliers, and provide tools to compare DB level data reconciliations. Forward Engineer To produce a clean set of metadata and business transformation logic, and baseline with version control, we: Extract, transform, and load metadata to the target state. Score metadata via NLP and ML to recommend matches to the Analyst who accepts/rejects or overrides recommendations. Analysts can then add additional transformations which are catalogued. Deploy and load cleansed data to the target state platform so any transformations and gold copies may be built. Automate Data Governance via Rest API, Code Block generation (Python/JSON) to provide enterprise catalogs with the latest transforms. P&P: What are your keys to a successful transformation journey? RR: Over the past several years we’ve identified these elements and observations: Subject matter experts and technologists must work together to provide new solutions. There’s a shortage of skilled technologists able to write, deploy, and securely manage next generation solutions. Using accelerators and transferring skills are vital to mitigating the skills shortage. Existing IP that’s buried in legacy applications must be understood and mined in order for a modernization program to succeed. A data-driven approach that combines reverse and forward engineering speeds migration and also provides new data governance and data science catalog capabilities. The building, caring, and feeding of new, open source-enabled applications is markedly different from the way monolithic legacy applications were built. The document model enables analytics and interoperability. Cybersecurity and data consumption patterns must be articulated and be part of the process, not afterthoughts. Even with aggressive transformation plans, new technology must co-exist with legacy applications for some time; progress works best if it’s not a big bang. Success requires business and technology to learn new ways to provide, acquire, and build agile solutions. P&P: Can you talk about solutions you have which will accelerate the modernization journey for the customers? RR: exf Insights helps our clients visualize what’s possible with extensive, pre-built, modular solutions for health care, financial services, and industrial 4.0. They show the power of MongoDB Atlas and also the power of speed layers using Spark and Confluent Kafka. These solutions are readily adaptable to client requirements and reduce the risk and time required to provide secure, production-ready applications. Source data loading. Analyze and integrate raw structured and unstructured data, including support for reference and transactional data. Metadata scan. Match data using AI/NLP, scoring results and providing side-by-side comparison. Source alignment. Use ML to check underlying data and score results for analysts, and leverage that learning to accelerate future changes. Codeless transformation. Empower data SMEs to build the logic with a multiple-sources-to-target approach and transform rules which support code value lookups and complex Boolean logic. Includes versioned gold copies of any data type (e.g., reference, transaction, client, product, etc.). Deployment. Deploy for scheduled or event-driven repeatability and dynamically populate Snowflake or other repositories. Generates code blocks that are usable in your estate or REST API. We used the same 5-step workflow data scientists use when we enabled business analysts to accelerate the retirement of internal data stores to build and deploy the COVID-19 self-checking app in three weeks, including active directory integration and downloadable apps. We will be offering a Realm COVID-19 screening app on web, Android, and IOS to the entire MongoDB Atlas community in addition to our own clients. The accelerator integrates key data governance tools, including exf Insights repository management of all sources and targets with versioned lineage; as-built transformation rules for internal and client implementations; and a business glossary integrated into metadata repositories. P&P: Usually one of the key challenges for businesses is data being locked in silos. RR: We couldn’t agree more. Our data modernization projects routinely integrate with source transactional systems that were never built to work together. We provide scanning tools to understand disparate data as well as ways to ingest, align, and stitch them together. Using health care as an example, exf Insights provides a comprehensive analytical capability, able to integrate data from hospitals, claims, pharmaceutical companies, patients, and providers. Some of this is NonSQL, such as radiological images; for pharma companies we provide capabilities to support clinical research organizations (CROs) via a follow-the-molecule approach. Of course, we also have to work with and subscribe to Centers for Medicare & Medicaid Services (CMS) guidelines. Our data migration focuses on collecting the IP behind the data and making the source, logic, and any transformations rules available to our clients. In financial services, it’s critical to understand source and targets. No matter how data is accessed (federated or direct store), with Spark and Kafka we can talk to just about any data repository. P&P: Once we discover the data to be migrated, we need to model the data according to MongoDB’s data model paradigm. That requires multiple transformations before data is loaded to MongoDB. Can you explain more about how your accelerators help here? RR: By understanding data consumption and then looking at existing data structures, we seek to simplify and then apply the capabilities of MongoDB’s document model. It’s not unlike what a data architect would do in the relational world, but with MongoDB Atlas it’s easier. We ourselves use MongoDB for our exf Insights platform to align, transform, and make data ready for consumption in new applications. We’re able to provide full rules lineage and audit trail, and even support rollback. For the real-time speed layer we use Spark and Kafka as well. This data-driven modernization approach also turns data governance into an active consumer of the rules catalog, so exf Insights works well for regulated industries. P&P: It’s great that we have data migrated now. Consider a scenario where it’s a mainframe application and we have lots of COBOL code in there. It has to be moved to a new programming language like Python, with a change in the data access layer to point to MongoDB. Do you have accelerators which can facilitate the application migration? If so, how? RR: Yes, we do have accelerators that understand the COBOL syntax to create JSON and ultimately Java, which speeds modernization. We also found we had to reverse engineer stored procedures as part of our client engagements for Exadata migration. P&P: Once we migrate the data from legacy databases to MongoDB, validation is the key step. As this is a heterogeneous migration it can be challenging. How can Exafluence add value here? RR: We’ve built custom accelerators that migrate data from the RDBMS world to MongoDB, and offer data comparisons as clients go from development to testing to production, documenting all data transformations along the way. P&P: Now that we’ve talked about all your tools which can help in the modernization journey, can you tell us about how you already helped your customers to achieve this? RR: Certainly. We’ve already outlined how we’ve created solution starters for modernization, with sample solutions as accelerators. But that’s not enough; our key tenet for successful modernization projects is pairing SMEs and developers. That’s what enables our joint client and Exafluence teams to understand the business, key regulations, and technical standards. Our data-driven focus lets us understand the data regardless of industry vertical. We’ve successfully used exf Insights now in financial services, healthcare, and industry 4.0. Whether it’s understanding the nuances of financial instruments and data sources for reference and transactional data, or Medical Device IoT sensors in healthcare, or shop floor IoT and PLC data for predictive analytics and digital twin modeling, a data-driven approach reduces modernization risks and costs. Below are some of the possibilities this data-driven approach has delivered for our healthcare clients using MongoDB Atlas. By aggregating provider, membership, claims, pharma, and EHR clinical data, we offer robust reporting that: Transforms health care data from its raw form into actionable insights that improve member care quality, health outcomes, and satisfaction Provides FHIR support Surfaces trends and patterns in claims, membership, and provider data Lets users access, visualize, and analyze data from different sources Tracks provider performance and identifies operational inefficiencies P&P: Thank you, Richard! Keep an eye out for upcoming conversations in our series with Exafluence, where we'll be talking about agility in infrastructure and data as well as interoperability. MongoDB and Modernization To learn more about MongoDB's overall Modernization strategy, read here .

December 9, 2020

Ready to get Started with MongoDB Atlas?

Start Free