Scale Out Without Fear or Friction: Live Resharding in MongoDB
Live resharding was one of the key enhancements delivered in our MongoDB 5.0 Major Release . With live resharding you can change the shard key for your collection on demand as your application evolves with no database downtime or complex data migrations . In this blog post, we will be covering: Product developments that have made sharding more flexible What you had to do before MongoDB 5.0 to reshard your collection, and how that changed with 5.0 live resharding Guidance on the performance and operational considerations of using live resharding Before that, we should discuss why you should shard at all, and the importance of selecting a good shard key – even though you have the flexibility with live resharding to change it at any time. Go ahead and skip the next couple of sections if you are already familiar with sharding! Why Shard your Database? Sharding enables you to distribute your data across multiple nodes. You do that to: Scale out horizontally — accommodate growing data or application load by sharding once your application starts to get close to the capacity limits of a single replica set. Enforce data locality — for example pinning data to shards that are provisioned in specific regions so that the database delivers low latency local access and maintains data sovereignty for regulatory compliance. Sharding is the best way of scaling databases and MongoDB was developed to support sharding natively. Sharding MongoDB is transparent to your applications and it’s elastic so you can add and remove shards at any time. The Importance of Selecting a Good Shard Key MongoDB’s native sharding has always been highly flexible — you can select any field or combination of fields in your documents to shard on. This means you can select a shard key that is best suited to your application’s requirements. The choice of shard key is important as it defines how data is distributed across the available shards. Ideally you want to select a shard key that: Gives you low latency and high throughput reads and writes by matching data distribution to your application’s data access patterns. Evenly distributes data across the cluster so you avoid any one shard taking most of the load (i.e., a “hot shard”). Provides linear scalability as you add more shards in the future. While you have the flexibility to select any field(s) of your documents as your shard key, it was previously difficult to change the shard key later on. This made some developers fearful of sharding. If you chose a shard key that doesn’t work well, or if application requirements change and the shard key doesn’t work well for its changed access patterns, the impact on performance could be significant. At this point in time, no other mainstream distributed database allows users to change shard keys, but we wanted to give users this ability. Making Shard Keys More Flexible Over the past few releases, MongoDB engineers have been working to provide more sharding flexibility to users: MongoDB 4.2 introduced the ability to modify a shard key’s value . Under the covers the modification process uses a distributed, multi-document ACID transaction to change the placement of a document in a sharded cluster. This is useful when you want to rehome a document to a different geographic region or age data out to a slower storage tier . MongoDB 4.4 went further with the ability to refine the shard key for a collection by adding a suffix to an existing key. Both of these enhancements made sharding more flexible, but they didn’t help if you needed to reshard your collection using an entirely different shard key. Manual Resharding: Before MongoDB 5.0 Resharding a collection was a manual and complex process that could only be achieved through one of two approaches: Dumping the entire collection and then reloading it into a new collection with the new shard key . This is an offline process, and so your application is down until data reloading is complete — for example, it could take several days to dump and reload a 10 TB+ collection on a three-shard cluster. Undergoing a custom migration that involved writing all the data from the old cluster to a new cluster with the resharded collection. You had to write the query routing and migration logic, and then constantly check the migration progress to ensure all data had been successfully migrated. Custom migrations entail less downtime, but they come with a lot of overhead. They are highly complex, labor-intensive, risky, and expensive (as you had to run two clusters side-by-side). It took one MongoDB user three months to complete the live migration of 10 billion documents. How this Changed with MongoDB 5.0: Live Resharding We made manual resharding a thing of the past with MongoDB 5.0. With 5.0 you just run the reshardCollection command from the shell, point at the database and collection you want to reshard, specify the new shard key, and let MongoDB take care of the rest. reshardCollection: "<database>.<collection>", key: <shardkey> When you invoke the reshardCollection command, MongoDB clones your existing collection into a new collection with the new shard key, then starts applying all new oplog updates from the existing collection to the new collection. This enables the database to keep pace with incoming application writes. When all oplog updates have been applied, MongoDB will automatically cut over to the new collection and remove the old collection in the background. Lets walk through an example where live resharding would really help a user: The user has an orders collection. In the past, they needed to scale out and chose the order_id field as the shard key. Now they realize that they have to regularly query each customer’s orders to quickly display order history. This query does not use the order_id field. To return the results for such a query, all shards need to provide data for the query. This is called a scatter-gather query. It would have been more performant and scalable to have orders for each customer localized to a shard, avoiding scatter-gather, cross-shard queries. They realize that the optimal shard key would be "customer_id: 1, order_id: 1" rather than just the order_id . With MongoDB 5.0’s live resharding, the user can just run the reshard command, and MongoDB will reshard the orders collection for them using the new shard key, without having to bring the database and the application down. Watch our short Live Resharding talk from MongoDB.Live 2021 to see a demo with this exact example. Not only can you change the field(s) for a shard key, you can also review your sharding strategy, changing between range, hash, and zones. Live Resharding: Performance and Operational Considerations Even with the flexibility that live resharding gives you, it is still important to properly evaluate the selection of your shard key. Our documentation provides guidance to help you make the best choice of shard key . Of course, live resharding makes it much easier to change that key should your original choice have not been optimal, or if your application changes in a way that you hadn’t previously anticipated. If you find yourself in this situation, it is essential to plan for live resharding. What do you need to be thinking about before resharding Make sure you have sufficient storage capacity available on each node of your cluster. Since MongoDB is temporarily cloning your existing collection, spare storage capacity needs to be at least 1.2x the size of the collection you are going to reshard. This is because we need 20% more storage in order to buffer writes that occur during the resharding process. For example, if the size of the collection you want to reshard is 2 TB compressed, you should have at least 2.4 TB of free storage in the cluster before starting the resharding operation. While the resharding process is efficient, it will still consume additional compute and I/O resources. You should therefore make sure you are not consistently running the database at or close to peak system utilization. If you see CPU usage in excess of 80% or I/O usage above 50%, you should scale up your cluster to larger instance sizes before resharding. Once resharding is done, it's fine to scale back down to regular instance sizes. Before you run resharding, you should update any queries that reference the existing shard key to include both the current shard key and the new shard key. When resharding is complete, you can remove the old shard key from your queries. Review the resharding requirements documentation for a full run down on the key factors to consider before resharding your collection. What should you expect during resharding? Total duration of the resharding process is dependent on the number of shards, the size of your collection, and the write load to your collection. For a constant data size, the more shards the shorter the resharding duration. From a simple POC on MongoDB Atlas, a 100 GB collection took just 2 hours 45 minutes to reshard on a 4-shard cluster and 5 hours 30 minutes on a 2-shard cluster. The process scales up and down linearly with data size and number of shards – so a 1 TB collection will take 10 times longer to reshard than a 100GB collection. Of course your mileage may vary based on the read/write ratio of your application along with the speed and quality of your underlying hardware infrastructure. While resharding is in flight, you should expect the following impacts to application performance: The latency and throughput of reads against the collection that is being resharded will be unaffected . Even though we are writing to the existing collection and then applying oplog entries to both its replicas and to the cloned collection, you should expect to see negligible impact to write latency given enough spare CPU. If your cluster is CPU-bound, expect a latency increase of 5 to 10% during the cloning phase and 20 to 50% during the applying phase (*) . As long as you meet the aforementioned capacity requirements, the latency and throughput of operations to other collections in the database won't be impacted . (*) Note: If you notice unacceptable write latencies to your collection, we recommend you stop resharding, increase your shard instance sizes, and then run resharding again. The abort and cleanup of the cloned collection are instantaneous. If your application has time periods with less traffic, reshard your collection during that time if possible. All of your existing isolation, consistency, and durability guarantees are honored while resharding is running. The process itself is resilient and crash-safe, so if any shard undergoes a replica set election, there is no impact to resharding – it will simply resume when the new primary has been elected. You can monitor the resharding progress with the $currentOp pipeline stage. It will report an estimate of the remaining time to complete the resharding operation. You can also abort the resharding process at any time. What happens after resharding is complete? When resharding is done and the two collections are in sync, MongoDB will automatically cut over to the new collection and remove the old collection for you, reclaiming your storage and returning latency back to normal. By default, cutover takes up to two seconds — during which time the collection will not accept writes, and so your application will see a short spike in write latency. Any writes that timeout are automatically retried by our drivers , so exceptions are not surfaced to your users. The cutover interval is tunable: Resharding will be quicker if you raise the interval above the two second default, with the trade-off that the period of write unavailability will be longer. By dialing it down below two seconds, the interval of write unavailability will be shorter. However, the resharding process will take longer to complete, and the odds of the window ever being short enough to cutover will be diminished. You can block writes early to force resharding to complete by issuing the commitReshardCollection command. This is useful if the current time estimate to complete the resharding operation is an acceptable duration for your collection to block writes. What you Get with Live Resharding Live sharding is available wherever you run MongoDB – whether that’s in our fully managed Atlas application data platform in the cloud , with Enterprise Advanced , or if using the Community Edition of MongoDB. To recap how you benefit from live resharding: Evolve with your apps with simplicity and resilience: As your applications evolve or as you need to improve on the original choice of shard key, a single command kicks off resharding. This process is automated, resilient, and non-disruptive to your application. Compress weeks/months to minutes/hours: Live resharding is fully automated, so you eliminate disruptive and lengthy manual data migrations. To make scaling out even easier, you can evaluate the effectiveness of different shard keys in dev/test environments before committing your choice to production. Even then, you can change your shard key when you want to. Extend flexibility and agility across every layer of your application stack: You have seen how MongoDB’s flexible document data model instantly adapts as you add new features to your app. With live resharding you get that same flexibility when you shard. New features or new requirements? Simply reshard as and when you need to. Summary Live Resharding is a huge step forward in the state of distributed systems, and is just the start of an exciting and fast-paced MongoDB roadmap that will make sharding even easier, more flexible, and automated. If you want to dig deeper, please take a look at the Live Resharding session recording from our developer conference and review the resharding documentation . To learn more about MongoDB 5.0 and our new Rapid Releases, download our guide to what’s new in MongoDB .
10 Signs Your Data Architecture Is Limiting Your Innovation: Part 3
When it comes to your database architecture, complexity can quickly lead to a drag on your productivity, frustration for your developers, and less time to focus on innovation while your team maintains the status quo. New feature rollouts take longer than they should, while your resources are consumed up by tedious tasks that allow your app to survive, but not truly thrive. This complexity manifests in many different ways; as they accumulate, they can become a serious hindrance to your ability to bring innovative ideas to market. We think of the effect as a kind of tax — a tax that is directly rooted in the complexity of your data architecture. We call it DIRT — the Data and Innovation Recurring Tax . We have identified ten symptoms that can indicate your business is paying DIRT. For an in-depth view, read our white paper 10 Signs Your Data Infrastructure is Holding You Back . Sign #5: New features are rolled out in months, not days With a complex data architecture, your developers have to switch constantly between languages and think in different frameworks. They may use one language to work directly with a database, another to use the object-relational mapping (ORM) layer built on top of it, and yet another to access search functionality. That becomes a major drag on productivity. That slows down your individual developers, but it also has consequences for how they work as a team. If every application architecture is bespoke, it’s almost impossible for developers’ skills to be shared and put to use across an organization. Development slows down. When a key person leaves, there is no one who can effectively fill in and you end up hiring for very specific skills. That’s hard enough, but you also don’t know if you’ll still need those skills in a year or three. Sign #6: It takes longer to roll out schema changes than to build new features If you’re rolling out application changes frequently — or trying to — and you’re using a relational database, then schema changes are hard to avoid. One survey found that 60% of application changes require modifications to existing schema, and, worse, those database changes take longer to deploy than the application changes they are supposed to support. Legacy relational databases require developers to choose a schema at the outset of a project, before they understand the entirety of the data they need or the ways in which their applications will be used. Over time, and with user feedback, the application takes shape — but often it’s not the shape that was originally anticipated. At that point, a fixed schema makes it very hard to iterate, leaving teams with a tough choice: try to achieve your new goals within the context of a schema that isn’t really suitable or go through the painful process of changing it. Learn more about the innovation tax and how to lessen it in our white paper DIRT and the High Cost of Complexity .
Faster Migrations to MongoDB Atlas on Google Cloud with migVisor by EPAM
As the needs of Google Cloud customers evolve and shift towards new user expectations, more and more customers are choosing the MongoDB Application Data Platform as an ideal alternative to legacy databases. Together, we’ve partnered with users looking to digitize and grow their businesses (such as Forbes ), or meet increased demand due to COVID (such as our work with Boxed , the online grocer) by scaling up infrastructure and data processing within a condensed time frame. As a fully-managed service within the Google Cloud Marketplace , MongoDB Atlas enables our joint customers to quickly deploy applications on Google Cloud with a unified user experience and an integrated billing model. Migrations to managed cloud database services vary in complexity, but even under the most straightforward circumstances, careful evaluation and planning is required. Customer database environments often leverage database technologies from multiple vendors, across different versions, and can run into thousands of deployments. This makes manual assessment cumbersome and error prone. This is where EPAM Systems , a provider with strategic specialization in database and application modernization solutions, comes in. EPAM’s database migration assessment tool, migVisor , is a first-of-its-kind cloud database migration assessment product that helps companies analyze database workloads, configuration, and structure to generate a visual cloud migration roadmap that identifies potential quick wins as well as challenge areas. migVisor identifies the best migration path for databases using sophisticated scoring logic to rank the complexity of migrating them to a cloud-centric technology stack. Previously applicable only to migrations from RDBMS to cloud-based RDBMS, migVisor is now available for MongoDB to MongoDB Atlas migrations. migVisor helps you: Analyze migration decisions objectively by providing a secure assessment of source and target databases that’s independent of deployed environments Accelerate time to migration by automating the discovery and assessment process, which reduces development cycles from a few weeks to a few days Easily understand tech insights by providing a visual overview of your entire journey, enabling better planning and improving stakeholder visibility Reduce database licensing costs by giving you intelligent insights on the target environment and recommended migration paths Key features of migVisor for MongoDB For several years, migVisor by EPAM has delivered automated assessments that have helped hundreds of customers migrate their relational databases to cloud-based or cloud-native databases. Now, migVisor adds support for the world’s leading modern data platform: MongoDB. As part of the initial release, migVisor will support self-managed MongoDB to MongoDB Atlas migration assessments. We plan to support TCO for MongoDB migrations, application modernization, migration assessment, and relational MongoDB migration assessments in future releases. MongoDB is also a natural fit for Google Cloud’s Open Cloud strategy of providing customers a broad set of fully managed database services, as Google Cloud's own GM and VP of Engineering & Databases, Andi Gutmans, notes: We are always looking for ways to simplify migrations for our customers. Now, with EPAM's database migration assessment tool, migVisor, supporting MongoDB Atlas, our customers can easily complete database assessments—including TCO analyses and migration complexity assessments, and generate comprehensive migration plans. A simplified migration experience combined with our joint Marketplace success enables customers to consolidate their data workloads into the cloud while making the development and procurement process simple—so users can focus more on innovation. How the migVisor assessment works migVisor analyzes source databases (on-prem or in any cloud environment) for migration assessment to a new target. The assessment includes the following steps: The simple-to-use migVisor Metadata Collector (mMC) collects metadata from the source database, including: featureCompatibilityVersion value, journaling status for data bearing nodes, MongoDB storage size used, replica set configuration, and more. Figure 1: mMC GUI Edit Connection Screen On the migVisor Analysis Dashboard you can select the source/target pair (e.g., MongoDB to MongoDB Atlas on Google Cloud). Figure 2: Source and Target Selection In the migVisor console, you can then view the automated assessment output that was created from migVisor’s migration complexity scoring engine, including classification of the migration into high/medium/low complexity and identification of potential migration challenges and incompatibilities. Figure 3: Source Cluster Features Finally, you can also export the assessment output in CSV format for further analysis in your preferred data analysis/reporting tool. Conclusion Together, Google Cloud and MongoDB have successfully worked with many organizations to streamline cloud migrations and modernize their legacy landscape. To build on the foundation of providing our customers with the best-in-class experience, we’ve closely worked with Google Cloud and EPAM Systems to integrate MongoDB Atlas with migVisor. Because of this, customers will now be able to better plan migrations, reduce risk and avoid missteps, identify quick wins for TCO reduction, review migration complexities, and appropriately plan migration phases for the best outcomes. Learn more about how you can deploy, manage, and grow MongoDB on Google Cloud on our partner page . If you’d like guidance and migration advice, please reach out to email@example.com to get in touch with the Google, MongoDB, and EPAM Sales teams.
Revolutionizing Data Storage and Analytics with MongoDB Atlas on Google Cloud and HCL
Every organization requires data they can trust—and access—regardless of its format, size, or location. The rapid pace of change in technology and the shift towards cloud computing is revolutionizing how companies handle, govern and manage their data by freeing them from the heavy operational burden of on-premise deployments. Enterprises are looking for a centralized, cost-effective solution that allows them to scale their storage and analytics so they can ingest data and perform artificial intelligence (AI) and machine learning (ML) operations, ultimately expanding their marketing horizon. This blog post explores why companies should partner with MongoDB Atlas on Google Cloud to begin their data revolution journey, and how HCL Technologies can support customers looking to migrate. MongoDB Atlas as the distributed data platform MongoDB Atlas is the leading database-as-a-service on the market for three main reasons: Unparalleled developer experience - allows organizations to bring new features to market at a high velocity Horizontal scalability - supports hundreds of terabytes of data with sub-second queries Flexibility - stores data to meet various regulatory, operational, and high availability requirements. The versatility offered by MongoDB’s document model makes it ideal for modern data-driven use cases that require support for structured, semi-structured, and unstructured content all within a single platform. Its flexible schema allows changes to support new application features without costly schema migrations typically required with relational databases. MongoDB Atlas extends the core database by offering services like Atlas Search and MongoDB Realm that are a necessity for modern applications. Atlas Search provides a powerful Apache Lucene-based full text search engine that automatically indexes data in your MongoDB database without the need for a separate dedicated search engine or error-prone replication processes. Realm provides edge-to-cloud sync and backend services to accelerate and simplify mobile and web development. Atlas’ distributed architecture supports horizontal scaling for data volume, query latency, and query throughput which offers the scalability benefits of distributed data storage alongside the rich functionality of a fully-featured general purpose database. MongoDB Atlas is unique in its ability to provide the most wanted database as a managed service and is relied on by the world’s largest companies for their mission-critical production applications. Innovation powered by collaboration with HCL Technologies MongoDB’s versatility as a general-purpose database, in addition to its massive scalability, makes it a perfect foundation for analytics, visualization, and AI/ML applications on Google Cloud. As an MSP partner for Google Cloud, HCL Technologies helps enterprises accelerate and risk-mitigate their digital agenda, powered by Google Cloud. We’ve successfully implemented applications leveraging MongoDB Atlas on Google Cloud, building upon MongoDB’s flexible JSON-like data model, rich querying and indexing, and elastic scalability in conjunction with Google Cloud’s class-leading cloud infrastructure, data analytics, and machine learning capabilities. HCL is working with some of the world’s largest enterprises in building secure, performant, and cost-effective solutions with MongoDB and Google. Possessing technical expertise in Google Cloud, MongoDB, machine learning, and data science, our dedicated team developed a reference architecture that ensures high performance and scalability. This is simplified by MongoDB Atlas’ support for Google Cloud services which allows it to essentially operate as a cloud-native solution. Highlighted features include: Integration with Google Cloud Key Management Service Use of Google Cloud’s native storage snapshot for fast backup and restore Ability to create read-only MongoDB nodes in Google Cloud to reduce latency with Google Cloud-native services regardless of where the primary node is located (even other public cloud providers!) Integrated billing with Google Cloud Ability to span a single MongoDB cluster across Google Cloud regions worldwide, and more As represented in Figure 1 below, MongoDB Atlas on Google Cloud can be used as a single database solution for transactional, operational, and analytical workloads across a variety of use cases. Figure 1: MongoDB's core characteristics and features The following architecture in Figure 2 demonstrates the ease of reading and writing data to MongoDB from Google Cloud services. Dataflow, Cloud Data Fusion, and Dataproc can be leveraged to build data pipelines to migrate data from heterogeneous databases to MongoDB and to feed data to create interactive dashboards using Looker. These data pipelines support both batch and real-time ingestion workloads and can be automated and orchestrated using Google Cloud - native services.. Figure 2: MongoDB Atlas' integration with core Google Cloud services A data platform built using MongoDB Atlas and Google Cloud offers an integrated suite of services for storage, analysis, and visualization. Address your business challenges with HCL: Industry use cases Data-driven solutions built with MongoDB Atlas on Google Cloud have multiple applications across industries such as financial services, media and entertainment, healthcare, oil and gas, energy, manufacturing, retail, and the public sector. Every industry can benefit from this highly integrated storage and analytical solution. Use Cases and Benefits Data lake modernization with low cost and high availability for media and entertainment customers: Maintaining high availability and a low-cost data lake is an obstacle for any online entertainment platform that builds mobile or web ticketing applications. However, building on Google App Engine with MongoDB Atlas Clusters in the backend allows for a high-availability, low-cost data platform that seamlessly feeds data to downstream analytics platforms in real time. Unified data platform for retail customers: The retail business frequently requests an agile environment in order to encourage innovation among its engineers. With its agility in scaling and resource management, seamless multi-region clusters, and premium monitoring, running MongoDB Atlas on Google Cloud is a fantastic choice for building a single data platform. This simplifies the management of different data platforms and allows developers to focus on new ideas. High-speed real-time data platform of supply chain system for manufacturing units: By having real-time visibility and distributed data services, supply chain data can become a competitive advantage. MongoDB Atlas on Google Cloud provides a solid foundation for creating distributed data services with a unified, easy-to-maintain architecture. The unrivaled speed of MongoDB Atlas simplifies supply chain operations with real-time data analytics. The way forward Even in just the past decade, organizations have been forced to adapt to the extremely fast pace of innovation in the data analytics landscape: moving from batch to real-time, on-premise to cloud, gigabytes to petabytes, and the increased accessibility of advanced AI/ML models thanks to providers like Google Cloud. With our track record of success in this domain, HCL Technologies is uniquely positioned to help organizations realize the joint benefits of building data analytics applications with best-of-breed solutions from Google Cloud and MongoDB. Visit us to learn more about the HCL Google Ecosystem Business Unit and how we can help you harness the power of MongoDB Atlas and Google Cloud Platform to change the way you store and analyze your data through these solutions.
Retail Tech in 2022: Predictions for What's on the Horizon
If 2020 and 2021 were all about adjusting to the Covid-19 pandemic, 2022 will be about finding a way to be successful in this “new normal”. So what should retailers expect in the upcoming year, and where should you consider making new retail technology investments? Omnichannel is still going strong Who would have anticipated the Covid-19 pandemic would still be disrupting lives after two years? For the retail industry this means more of the same - omnichannel shopping. Despite the hope many of us had for the end of the pandemic and the gradual increase of in-person shopping, retail workers can expect to continue accommodating all kinds of shopping experiences – online shopping, brick and mortar shopping, buy online and pick up in store, reserve online and pick up in store. Even beyond the pandemic, the face of shopping is likely forever changed. This means retailers need to start considering the long-term tech investments required to meet transforming customer expectations. Adopting solutions that offer a single view of the consumer gives you the unique opportunity to personalize offerings, products and loyalty programs to their demand. With a superior consumer experience, you can achieve repeat business and increased customer loyalty. While many retailers may have thought they could “get by” with their current solutions until the pandemic ends, it’s time to rethink that approach and start exploring more long-term solutions to improve omnichannel shopping experiences. Leaner tech stacks over many specialized solutions In 2022, you should explore solutions that allow your IT teams to do more with less. The typical retail tech stack looks something like the diagram below. Legacy, relational databases are supplemented by other specialist NoSQL and relational databases, and additional mobile data and analytics platforms. As a result, retailers looking to respond quickly to changing consumer preferences and improve the customer experience face an uphill battle against siloed data, slow data processing, and unnecessary complexity. Your development teams are so busy cobbling solutions together and maintaining different technologies at once that they fail to innovate to their full potential, so you’re never quite able to pull ahead of the competition. This is the data innovation recurring tax (or DIRT) . Think of this as the ongoing tax on innovation that spaghetti architectures, like the example above, legacy architecture costs your business. As technology grows more sophisticated and data grows more complex, companies are expected to react almost instantaneously to signals from their data. Legacy technologies, like relational databases, are rigid, inefficient, and hard to adapt, making it difficult to deliver true innovation to your customers and employees in a timely manner. Your development teams are so busy cobbling solutions together that they fail to innovate to their full potential, so you’re never quite able to pull ahead of the competition. It’s time to rethink your legacy systems, and adopt solutions that streamline operations and seamlessly share data to ensure you’re working with a single source of data truth. Many retailers recognize the need to upgrade legacy solutions and get away from multiple different database technologies, but you may not know where to start. Look for modern data applications that simplify data collection from disparate sources and include automated conflict resolution for added data reliability. Also, consider what you could do with fully managed application data platforms, like MongoDB Atlas . With someone else doing the admin work, your developers are free to focus on critical work or turn their talents to innovation. Digital worker enablement will increase retention For employees, 2022 looks set to continue last year’s trend of the “ Great Resignation ”. To combat worker fatigue, and retain your workforce you need to prioritize worker engagement. One way to better engage your employees is through mobile workforce enablement. While many companies consider how to engage their customers with a more digital-friendly work environment, you shouldn’t forget about your workers in the process. Global companies like Walmart are starting to invest in mobile apps to enable their workforce. A modern, always-on retail workforce enablement app could transform the way your employees do their jobs. Features like real-time view of stock, cross-departmental collaboration, detailed product information, instant communication with other stores can simplify your workers’ experiences and help them to better serve your customers. Your workers need an always-on app that syncs with your single source of data truth, regardless of connectivity (which may be an issue as retail workers are constantly on the move). But building a mobile app with data sync capabilities can be a costly and time-intensive investment. MongoDB Realm Sync solves for this with an intuitive, object-oriented data model that is simple to use, and an out-of-the-box data synchronization service. When your mobile data seamlessly integrates with back-end systems, you can deliver a modern, distributed application data platform to your workers. Huge investment in the supply chain From microchips to toilet paper, disruptions in the supply chain were a huge issue in 2020 and 2021, and the supply chain pain continues in 2022. And while there continue to be supply chain issues beyond the control of retailers, there are steps that can be taken to mitigate some of the pain and prepare for future disruptions. Warehouse tech is getting smarter, and you need to upgrade your solutions to keep up. For starters, consider adopting the right application data platform to unify siloed data and gain a single view of operations . A single view of your data will allow for better management of store-level demand forecasts, distribution center-to-store network optimizations, vendor ordering, truck load optimizations, and much more. With a modern application data platform, all this data feeds into one, single view application, giving retailers the insights to react to supply chain issues in real time. With disruption set to dominate 2022, as it did in 2020 and 2021, investing in proactive solution upgrades could help your business not only survive, but thrive. Want to learn more about gaining a competitive advantage in the retail industry? Get this free white paper on retail modernization .
Ventana Research's Latest Report Highlights MongoDB's Role as a Cloud Data Platform Provider
Ventana Research, a market advisory and research firm, recently published an Analyst Perspective on MongoDB, noting that MongoDB and its application data platform provide businesses the ability to accelerate development and data-driven decision-making. As Ventana Research explains the evolution from traditional databases to modern, cloud-based application data platforms, the study covered multiple trends related to both the present and future of data platform software. We have identified six key trends from MongoDB that were represented in the Ventana Research Analyst Perspective. Non-relational, or NoSQL, databases are on the rise . We see this as evidence of an unprecedented, widespread change in how businesses perceive and use their databases. Cloud-based services and products are rapidly gaining popularity . Given the rise of real-time, data-driven applications, organizations are relying more and more on the flexibility, availability, and functionality of cloud-native data platforms. Such products are ideal for quickly building competitive products, delivering highly personalized experiences, and improving business agility. As a result, operational database requirements will only become more demanding . As applications become more advanced, databases will become a pivotal part of an organization’s success — or failure. We believe that in order to keep up with their applications (and their competition), companies require a comprehensive, powerful application data platform like MongoDB Atlas. Convergence is the name of the game . As companies seek out new and better operational data platforms, both relational and non-relational database providers will venture into areas that were traditionally dominated by their competitors. Examples include non-relational databases (like MongoDB) adding relational features like ACID transactions, or relational databases offering compatibility for non-relational data models like graphs or documents. Companies are increasingly opting for hybrid and multi-cloud models . MongoDB Atlas’ multi-cloud clusters enable users to leverage exclusive provider features (like Google Cloud’s AI tools), improve availability in geographic regions, or migrate data across clouds with no downtime. Non-relational, cloud-native databases are becoming more powerful — and more attractive to customers . Thanks to convergence and competition, non-relational databases are becoming ever more capable. Their advancements include real-time analytics, rich visualizations, and mobile data sync and storage. Read Ventana Research Analyst Perspectives to gain insight into the current data landscape and the possibilities of tomorrow. Updated January 17, 2022.
Data and the European Landscape: 3 Trends for 2022
The past two years have brought massive changes for IT leaders: large and complex cloud migrations; unprecedented numbers of people suddenly working, shopping and learning from home; and a burst in demand for digital-first experiences. Like everyone else, we are hoping that 2022 isn’t so disruptive (fingers crossed!), but our customer conversations in Europe do lead us to believe the new year will bring new business priorities. We’re already noticing changes in conversations around vendor lock-in, thanks to the Digital Markets Act, a new enthusiasm for combining operational and analytical data to drive new insights faster, and a more strategic embrace of sustainability. Here’s how we see these trends playing out in 2022. Digital markets act draws new attention to cloud vendor lock-in in Europe We’ve heard plenty about the European Commission’s Digital Markets Act , which, in the name of ensuring fair and open digital markets, would place new restrictions on companies that are deemed to be digital “gatekeepers” in the region. That discussion will be nothing compared to the vigorous debate we expect once the EU begins the very tricky political business of determining exactly which companies will fall under the act. If the EU sets the bar for revenues, users, and market size high enough, it’s possible that the regulation will end up affecting only Facebook, Amazon, Google, Apple, and Microsoft. But a European group representing 2,500 CIOs and almost 700 organisations is now pushing to have the regulation encompass more software companies. Their main concern centers around “distorted competition” in cloud infrastructure services and a worry that companies are being locked into one cloud vendor. A trend that will likely increase in 2022 that pushes back on cloud vendor lock-in is embracing multi-cloud strategies. We should expect to see more organisations in the region pursuing multi-cloud environments as a means to improve business continuity and agility whilst being able to access best of breed services from each cloud provider. As we have always said …”it’s fine to date your cloud provider….but don’t ever marry them.” The convergence of operational and analytical data The processing of operational and analytical data is almost always contained in different data systems, each tuned to that use case and managed by separate teams. But because that data lives in separate places, it’s almost impossible for organisations to generate insights and automate actions in real time, against live data. We believe 2022 is the year we’ll see a critical mass of companies in the region make significant progress toward a convergence of their operational and analytical data. We’re already starting to see some of the principles of microservices in operational applications, such as domain ownership, be applied to analytics as well. We’re hearing about this from so many of our customers locally, who are looking at MongoDB as an application data platform that allows them to perform queries across both real-time and historical data, using a unified platform and a single query API. This results in the applications they are building becoming more intelligent and contextual to their users, while avoiding dependencies on centralized analytics teams that otherwise slow down how quickly new, data-driven experiences can be released. Sustainability drives local strategic IT choice Technology always has some environmental cost. Sometimes that’s obvious — such as the energy needs and emissions associated with Bitcoin mining. More often, though, the environmental costs are well hidden. The European Green Deal commits the European Union to reducing emissions by 55% by 2030, with a focus on sustainable industry. With the U.N. Climate Change Conference (COP26) recently completed in Glasgow, and coming off the hottest European summer on record, climate issues have become top of mind. That means our customers are increasingly looking to make their technical operations more sustainable — including in their choice of cloud provider and data centers. According to research from IDC , more than 20% of CxOs say that sustainability is now important in selecting a strategic cloud service provider, and some 29% of CxOs are including sustainability into their RFPs for cloud services. Most interesting, 26% say they are willing to switch to providers with better sustainability credentials. Historically, it’s been difficult to make a switch like that. That’s part of the reason we built MongoDB Atlas — to give our customers the flexibility to run in any region , with any of the three largest cloud providers, and to make it easy to switch between them, and even to run a single database cluster across them. Publicly available information about the footprint of individual regions and even single data centers will make it simpler for companies to make informed decisions. Already, at least one cloud platform has added indicators to regions with the lowest carbon footprint. So while we hope 2022 will not be as disruptive as the years gone by, it will still bring seminal changes to our industry. These changes will also prompt organisations toward more agile, cohesive and sustainable data platform strategies as they seek to gain competitive advantage and exceed customer expectations. Source: IDC, European Customers Engage Services Providers at All Stages of Their Cloud Journey, IDC Survey Spotlight, Doc #EUR248484021, Dec 2021
Joyce, a Decentralized Approach to Foster Business Agility
Despite all of the tools and methodologies that have arisen in the last few years, many companies, particularly those that have been in the market for decades, struggle when it comes to leveraging their operational data to build new digital products and services. According to research and surveys conducted by McKinsey over the last few years, the success rate of digital transformations is consistently low, with less than 30% succeeding at improving their company’s performance. There are a lot of reasons for this, but most of them can be summarized in a sentence: A digital transformation is primarily an organizational and cultural change and then a technological shift. The question is not if digital transformation is a good thing nor is it if moving to the cloud is a good choice. Companies need (badly, in some cases) a digital transformation and yes, the pros of moving to the cloud usually overcome the cons. So, let’s try to dig deeper and analyze three of the main problems companies face when they go on this journey Digital products development Products by nature are customer-driven but companies run their businesses on multiple back-end systems that are instead purpose-driven. Unless you run a very small business, different people with different objectives have ownership of such products and systems. Given this context, what happens when a company wants to launch a new digital product at speed? The back-end systems (CRMs, E-commerce, ERP, etc.) hold the data they need to bring to the customer. Some systems are SaaS, some are legacy, and perhaps others are custom applications created by the company that disrupted the market with innovative solutions back in the days, the perfect recipe for integration hell. The product manager needs to coordinate and negotiate multiple change requests with the system’s owners whilst trying to convince them to add their needs in the backlog to meet the deadline. And things get even worse, as the new product relies on the computational power of the source systems, and if those systems cannot handle the additional traffic, both the product and the core services will be affected. Third-party integration “Everybody wants the change, (almost) nobody wants to change.” In this ever-growing digital world, partnering with third parties (whether they are clients or service providers) is crucial, but everyone who has tried to do so knows how challenging this is: non-standard interfaces, CSV files over FTP with fancy update rules, security issues… The list of unwanted things can grow indefinitely. SaaS everywhere The Software-as-a-Service model is extremely popular and getting the service you want without worrying about the underlying infrastructure gives freedom and speed of adoption, but what happens when a big company relies on multiple SaaS products to run their business? Sooner or later, they experience loss of control and higher costs in keeping a consistent view of the big picture. They need to deal with SaaS internal representations of their own data, multiple views of the same domain concept, unplanned expenses to export, and interpret and integrate the data from different sources with different formats. Putting it all together All the issues above fall into a well-known category of information technology. They are integration problems, and over the years, a lot of vendors promised a definitive solution. Now, you can consider low-code/no-code platforms with hundreds of ready-made connectors and modern graphical interfaces. Problem solved, right? Well, not really. Low-code integration platforms simplify implementation. They are really good at it, but doing so oversimplifies the real challenge: creating and maintaining a consistent set of APIs shaped around the business value over time, and preventing the interfaces from leaking internal complexities to the rest of the company, something that has to be defined and maintained through architectural choices and proper skills (completely hidden behind the selling points of such platforms). There are two different ways to solve integration problems: Centralized using adapters. In this case, the logic is pushed to the central orchestration component, with integration managed through a set of adapters. This is the rather old school SOA approach, the one that the majority of market integration platforms are built on. Decentralized, pushing the logic to the edges, giving autonomous teams the freedom to define both the boundaries and the APIs that a domain must expose to deliver business value. This is a more modern approach that has arisen recently alongside the rise of microservices and, in the analytical world, with the concept of data mesh. The former gives speed at the starting point and the illusion of reducing the number of choices and skills to manage the problems, but in the long run, inevitably, this begins to accumulate technical debt. Due to the lack of necessary degrees of freedom, you lose the ability to evolve the integration points over time, the same thing that caused the transition from SOA to microservices architectures. The latter needs the relevant skills, vision, and ability to execute but gives immediate results and allows you to flexibly manage the evolution of the enterprise architecture over time. Old problems, new solutions At Sourcesense in the last 20 years, we have partnered on hundreds of projects to bring agility, speed, and new open-source technology to our customers. Many times through the years, we were faced with the integration challenges above, and yes, we tried to solve them with the technology available at the time, so we have built some integration solutions on SOA (when they were the best of breed) and interacted with many of the integration platforms on the market. Then, we struggled with the issues and limitations of the integration landscape and have listened to our customers’ needs and where expectations have fallen short. The rise of agile methodologies, cloud computing, new techniques, technologies, and architectural styles has given an unprecedented boost to software evolution and the ability to support business needs, so we embraced the new wave and now have growing experience in solving problems with these tools. Along the way, we’ve seen a recurring pattern when we encountered integration problems, the effectiveness of data hubs as components of the enterprise architectures to solve these challenges, so we built one of our own: Joyce. Data hubs This is a relatively new term and refers to software platforms that collect data from different sources with the main purpose of distribution and sharing. Since this definition is broad and vague, let’s add some other key elements that matter and help define the contours of our implementation. Collecting data from different sources can bring three major benefits: Computational decoupling from the sources. Pulling (or pushing) the data out of the originating systems means that client applications and services interact with the hub and not directly with the sources, preventing them from being slowed down by additional traffic. Catalog and discoverability. If data is collected correctly, this leads to the creation of a catalog, allowing people inside the organization to search, discover, and use the data inside the hub. Security. The main purpose of the hubs is distribution and sharing. This leads immediately to focus on access control and security hardening. A single access point simplifies the overall security around the data because it significantly reduces the number of systems the clients have to interact with to gather the data they need. Joyce, how it works The cornerstone concept of Joyce is the schema. It allows you to shape the ingested data and how this data will be made available to client services. Using the same declarative approach made popular by Kubernetes, the schemas describe the expected result and the platform performs the actions to make it happen. Schemas are standard JSON schema files stored and classified in a catalog. Their definition falls into three categories: Input – how to gather and shape the source data. We leverage the Kafka Connect framework to provide ready-made connectors for a wide variety of sources. The ingested data can be filtered, formatted, and enriched with transformation handlers (domain-specific extensions of JSON schema). Model – allows you to create new aggregates from the data stored in the platform. This feature gives the freedom to model the data the way needed by client services. Export – bulk data export capability. Exported data can be any query run against the existing data with an optional temporal filter. Input and model data is made available to all the client services with the proper authorization grants through auto-generated REST and GraphQL APIs. It is also possible to subscribe to a dedicated topic if an event-driven approach is more suitable for the use-case. MongoDB: the key for a flexible model and performance at scale We heavily rely on MongoDB. Thanks to its flexibility, we can easily map any data structure the user defines to collect the data. Half of the schema definition is basically the definition of a MongoDB schema. (We also auto-generate one schema per collection to guarantee data integrity.) Joyce runs in a Kubernetes cluster and all its services are inherently stateless to exploit the full potential of horizontal scaling. The architecture is based on the CQRS pattern. This means that writes and reads are completely decoupled and can scale independently to meet the unique needs of the production environment. MongoDB is also the backing database of the API layer so we can keep the promise of low latency, high throughput, and continuous availability along all the components of the stack. The platform is available as a fully managed PaaS on the three major cloud providers (AWS, Azure, GCP) but if needed, it can be installed on an existing infrastructure (in cloud and on prem). Final considerations There are many challenges leaders must face for a successful digital transformation. They need to guide their organizations along a process that involves changes on many levels. The exponential growth of technological solutions in the last few years adds more complexity and confusion. The evolution of organizational models and methodologies point in the direction of shared responsibility, people empowerment, and autonomous teams with a light and effective central governance. The same evolution also permeates the novel approaches to enterprise architectures like the data mesh. Unfortunately, there’s no silver bullet, just the right choices for the given context. Despite all the marketing and hype around this or that one solution to all of your digital transformation needs, a long term successful shift needs guidance, competence and empowerment. We’ve built Joyce with the aim of reducing the burden of repetitive tasks and boilerplate code to get the results faster and catch the low hanging fruits without trying to replace the necessary architectural thinking to properly define the current state and the evolution of the enterprise architectures of our customers. If you’re struggling with the problems enlisted at the beginning of this article you should give Joyce a try. Learn more about Joyce
FHIR Technology is Driving Healthcare's Digital Revolution
Technology supporting healthcare’s digital transformation is so pervasive that the question isn’t what technology to choose, but rather, what problems need to be solved. Advancing technology and access to secure and real-time data analytics will vastly improve patients’ health and happiness, and growing interoperability standards are pushing organizations forward in their digital transformations. Together with the Healthcare Information and Management Systems Society (HIMSS) and leading healthcare insurance provider Humana , MongoDB recently released a three-part podcast series chronicling the ways Fast Healthcare Interoperability Resources (FHIR), AI, and the cloud are reshaping healthcare for the better. Here’s a quick roundup of our discussions. Data is the future of healthcare . Whether providers are driving patient engagement through wearable devices, wellness programs or connected care, data will take healthcare to the next digital frontier. We’ll see these advancements through AI, FHIR, and the cloud. FHIR is revolutionizing healthcare technology . Not only is FHIR implementation a requirement, it’s also a crossroads for data architects. Choosing the right approach has deep implications for healthcare IT. The operational data layer (ODL) approach to interoperability makes the impossible possible . Through Humana’s digital transformation journey, it became clear that meaningful progress isn’t possible using core legacy database systems. AI, FHIR, and the cloud: Why data is the future of healthcare In this episode , we dive into what a digital transformation would look like for the healthcare industry, and what are some of the biggest technology challenges facing healthcare today. A digitally transformed healthcare industry will weave real-time data analytics with more personalized care. Patients today want a more modern healthcare experience that includes telemedicine, digital forms and touchless mobile check ins. The end goal is simple: maximize the human experience while advancing away from legacy technology systems that slow down both healthcare practitioners and patients. When it comes to today’s biggest healthcare challenges, the cloud stands out as a key driver of promise and peril. The promise is that we can build applications, go to market and reach patients through wellness programs more quickly. The peril lies in the infrastructure, which is unknown to many healthcare organizations. This presents a unique challenge for the architects and certainly the developers at organizations with older legacy systems. The challenge here is avoiding a simple left hand shift or cloud for the sake of cloud, and moving from simple modernization to actual transformation. Listen below to hear the entire conversation Your browser does not support the audio element. Bring the FHIR inside for digital transformation In episode 2 , HIMSS and MongoDB take a closer look at why FHIR is a change agent in healthcare technology, and how healthcare organizations globally are using the new data standard to jump start legacy modernization and digital transformation. What is FHIR? The FHIR standard is a common set of schema definitions and APIs that helps providers and patients manage and exchange healthcare data. Using FHIR, records provided by healthcare organizations are standardized into a common data model over rest-based APIs. It makes the data that healthcare providers and payers use easier to exchange. Growing regulatory pressure has accelerated U.S. FHIR adoption among healthcare organizations and technology vendors.The Centers for Medicare and Medicaid Services (CMS) started a rolling deadline for FHIR compliance in 2020, with fines for institutions that fall behind. As a result, for most U.S.-based healthcare providers, payers, and their technology vendors, the past few years were a headlong race to adopt FHIR. Here are three reasons why FHIR is hugely significant for healthcare technology leaders: It’s a federal mandate from the Centers for Medicare & Medicaid Services. It’s a complex data integration challenge. Legacy systems built before the mid 2010s are not interoperable with the FHIR mandate. FHIR implementation approaches For large organizations with huge data requirements, data architects can experience paralysis from the sheer volume of legacy systems to unwind. These groups have all of their patients’ electronic healthcare record information, payer information and more bound up in legacy systems, none of which is interoperable with FHIR. The second challenge is cloud migration, which can be skirted by organizations using a checkbox compliance approach. In those cases, API layers are used to ingest and serve data to legacy systems, but are not really integrated with the legacy system in real time. The most successful approach to tackling this challenge is not to rewrite, unwind or replace legacy systems completely, but keep them contained. We recommend bringing in an operational data layer that exposes the information in the legacy system and keeps it in sync with the legacy system, but then lands it in an ODL in the FHIR standard. With the FHIR API, patients and providers can interact with data in real time and access records in milliseconds after a diagnosis. Real-time records synced with legacy systems and patients’ private data is protected. Delve into the full conversation below Your browser does not support the audio element. FHIR and the future of healthcare at Humana You don't have to take the rip and replace approach when modernizing your legacy systems with an ODL method. This was a key to successful modernization for Humana, as discussed in the third and final episode in our series. For large enterprises that may have decades’ worth of acquired legacy systems, often pulling similar datasets from disparate databases, the pursuit of modernized interoperability begins to look like an impossible task. Listen to the final episode of our podcast series to here how Humana’s ODL approach met the company’s data velocity requirements, and next steps for personalized healthcare and interoperability at Humana. Listen to the entire conversation below Your browser does not support the audio element. More related FHIR and healthcare resources [ White paper ] Bring the FHIR Inside: Digital Transformation Without the Rip and Replace [ On-demand webinar ] Building FHIR Applications with MongoDB