MongoDB Applied

Customer stories, use cases and experience

Migrate Your Mindset to the Cloud Along with Your Data — A Conversation with Mark Porter and Accenture’s Michael Ljung

The challenges of getting data and applications into the cloud are well-known. Technology isn’t always the hardest part of cloud migration, however, and it won’t produce digital transformation on its own. In many cloud migrations, both people and processes need to change along with the technology. That’s because the processes that work in a legacy environment won’t necessarily help an organization thrive in the cloud. Instead, the opposite can happen: Legacy procedures tend to produce legacy results, making it difficult to achieve the impact that so many organizations seek from cloud-based digital transformation. As part of an ongoing series on cloud migration and digital transformation, MongoDB CTO Mark Porter sat down with Michael Ljung, Accenture’s Global Engineering Lead, to examine new approaches and new ways of thinking that can be crucial to success in the cloud. Experience and perspective During their conversation, both Porter and Ljung recounted situations during which they, and their organizations, were called upon to partner in a new way with clients that were struggling to migrate to the cloud. Each knew that their experience and perspective could lead to success for these organizations. They also knew that their message — that sometimes it’s necessary to go slow to go fast — might not find receptive ears. When organizations bring their old procedures and old deployment technologies to the cloud, “They’re in two places for a little while,” Porter says. Their data and applications may be in the cloud, but their mindset is on premises. Porter says that the MongoDB team helped a large cryptocurrency exchange through this exact situation. MongoDB helped the exchange get through the learning curve associated with new technologies, acting as an embedded member of the team and even guiding them in setting quarterly goals for their migrations. Ljung described a large government client that wanted to move to the cloud and do it quickly. The organization embraced agile methodologies but didn’t have the automation or the experience with CI/CD to support cloud development. They were releasing new code to production almost daily, but a fix in one place could easily cause a breaking change elsewhere, and often did. Digital done right The solution was to take a step back. Accenture started by supporting the organization in mastering incremental delivery. Next came some basic automation. With that in place, the organization was able to return to agile methodologies and organize themselves into sprints. Now, Ljung says, “This client is an example of digital transformation done right” — all because, as he and Porter agreed, they were willing to go slow to go fast. Watch the full video series with Mark Porter and Michael Ljung to learn more about the strategies that support successful cloud migration and digital transformation.

September 27, 2022
Applied

4 Ways to Create a Zero Trust Environment in Financial Services

For years, security professionals protected their IT much like medieval guards protected a walled city — they made it as difficult as possible to get inside. Once someone was past the perimeter, however, they had generous access to the riches within. In the financial sector, this would mean access to personal identifiable information (PII), including a “marketable data set” of credit card numbers, names, social security information, and more. Sadly, such breaches occurred in many cases, adversely affecting end users. A famous example is the Equifax incident, where a small breach led to years of unhappy customers. Since then, the security mindset has changed as users increasingly access networks and applications from any location, on any device, on platforms hosted in the cloud — the classic point-to-point security approach is obsolete. The perimeter has changed, so reliance on it as a protective barrier has changed as well. Given the huge amount of confidential client and customer data that the financial services industry deals with on a daily basis — and the strict regulations — security needs to be an even higher priority. The perceived value of this data also makes financial services organizations a primary target for data breaches. In this article, we’ll examine a different approach to security, called zero trust , that can better protect your assets. Paradigm shift Zero trust presents a new paradigm for cybersecurity. In a zero trust environment, the perimeter is assumed to have been breached; there are no trusted users, and no user or device gains trust simply because of its physical or network location. Every user, device, and connection must be continually verified and audited. Here are four concepts to know about creating a zero trust environment. 1. Securing the data Although ensuring access to banking apps and online services is vital, the database, which is the backend of these applications, is a key part of creating a zero trust environment. The database contains much of an organization’s sensitive, and regulated, information, along with data that may not be sensitive but is critical to keeping the organization running. Thus, it is imperative that a database be ready and able to work in a zero trust environment. As more databases are becoming cloud-based services, an important aspect is ensuring that the database is secure by default—meaning it is secure out of the box. This approach takes some of the responsibility for security out of the hands of administrators, because the highest levels of security are in place from the start, without requiring attention from users or administrators. To allow access, users and administrators must proactively make changes— nothing is automatically granted. As more financial institutions embrace the cloud, securing data can get more complicated. Security responsibilities are divided between the clients’ own organization, the cloud providers, and the vendors of the cloud services being used. This approach is known as the shared responsibility model. It moves away from the classic model where IT owns hardening of the servers and security and then needs to harden the software on top—for example, the version of the database software—and then harden the actual application code. In this model, the hardware (CPU, network, storage) are solely in the realm of the cloud provider that provisions these systems. The service provider for a Data-as-a-Service model then delivers the database hardened to the client with a designated endpoint. Only then does the actual client team and their application developers and DevOps team come into play for the actual solution. Security and resilience in the cloud are only possible when everyone is clear on their roles and responsibilities. Shared responsibility recognizes that cloud vendors ensure that their products are secure by default, while still available, but also that organizations take appropriate steps to continue to protect the data they keep in the cloud. 2. Authentication for customers and users In banks and finance organizations, there is a lot of focus on customer authentication, or making sure that accessing funds is as secure as possible. It’s also important, however, to ensure secure access to the database on the other end. An IT organization can use various methods to allow users to authenticate themselves to a database. Most often, the process includes a username and password. But, given the increased need to maintain the privacy of confidential customer information by financial services organizations, this step should only be viewed as a base layer. At the database layer, it is important to have transport layer security and SCRAM authentication , which enables traffic from clients to the database to be authenticated and encrypted in transit. Passwordless authentication should also be considered—not just for customers, but for internal teams as well. This can be done in multiple ways with the database, for example, auto-generated certificates may be required to access the database. Advanced options exist for organizations already using X.509 certificates that have a certificate management infrastructure. 3. Logging and auditing In the highly regulated financial industry, it is also important to monitor your zero trust environment to ensure that it remains in force and encompasses your database. The database should be able to log all actions or have functionality to apply filters to capture only specific events, users, or roles. Role-based auditing lets you log and report activities by specific roles, such as userAdmin or dbAdmin, coupled with any roles inherited by each user, rather than having to extract activity for each individual administrator. This approach makes it easier for organizations to enforce end-to-end operational control and maintain the insight necessary for compliance and reporting. 4. Encryption With large amounts of valuable data, financial institutions also need to make sure that they are embracing encryption —in flight, at rest, and even in use. Securing data with client-side, field-level encryption allows you to move to managed services in the cloud with greater confidence. The database only works with encrypted fields and organizations control their own encryption keys, rather than having the database provider manage them. This additional layer of security enforces an even more fine-grained separation of duties between those who use the database and those who administer and manage it. Also, as more data is being transmitted and stored in the cloud—some of which are highly sensitive workloads—additional technical options to control and limit access to confidential and regulated data is needed. However, this data still needs to be used. So, ensuring that in-use data encryption is part of your zero trust solution is vital. This approach enables organizations to confidently store sensitive data, meeting compliance requirements while also enabling different parts of the business to gain access and insights from it. Conclusion In a world where security of data is only becoming more important, financial services organizations rank among those with the most to lose if data gets into the wrong hands. Ditching the perimeter mentality and moving toward zero trust—especially as more cloud and as-a-service offerings are embedded in infrastructure—is the only way to truly protect such valuable assets. Learn more about developing a strategic advantage in financial services. Read the ebook now .

September 26, 2022
Applied

Relational to NoSQL at Enterprise Scale: Lessons from Amazon

When most people think about Amazon, they think of the cloud. But the company was founded more than a decade before anyone was talking about the cloud. In fact, by 2002, when Amazon founder Jeff Bezos wrote a now-famous internal email directing all new software development to be designed around service-oriented architecture, Amazon was already a $5 billion enterprise. In 2017, Amazon was generating more than 50 times that annual revenue, and like many enterprise organizations, the core of that revenue was driven by the monolithic services that formed the backbone of the business. Those monoliths didn’t go away overnight, and in 2017 and 2018, Amazon kicked off a massive RDBMS-to-NoSQL migration project called “Rolling Stone” to move about 10,000 RDBMS-backed microservices as well as decompose the remaining monoliths into microservices backed by NoSQL. Amazon chose to use its own NoSQL database, but the lessons from that huge effort are valuable for any migration to a NoSQL or document database. In this article, I’ll share some of the insights gained about when and how to use NoSQL. RDBMS costs At the time of this migration, I ran the NoSQL Blackbelt Team for Amazon’s retail business, which was the center of excellence for the business and which developed most of the design patterns and best practices that Amazon uses to build NoSQL-backed application services today. In 2017, Amazon had more than 3,000 Oracle server instances, 10,000 application services and 25,000 global developers, and almost the entire development team was well versed in relational database technology. The cost of the IT infrastructure driving the business, however, was spiraling out of control. As the team started to look for root causes, they quickly realized that the cost of the relational database management system (RDBMS) was a big line item. The infrastructure required to support RDBMS workloads was enormous and did not scale well to meet the needs of the company’s high-demand services. Amazon had the biggest Oracle license and the largest RAC deployments in the world, and the cost and complexity of scaling services built on RDBMS was negatively affecting the business. As a result, we started looking at what we were actually doing in these databases. A couple of interesting things came out. We found that 70% of the access patterns that we were running against the data involved a single row of data on a single table. Another 20% were on a range of rows on a single table. So, we weren’t running complex queries against the data at high velocity. In fact, the vast majority were just inserts and updates, but many of those were executed “transactionally” across disparate systems using two-phase commits to ensure data consistency. Additionally, the cost was very high for the other 10% of the access patterns because most were complex queries requiring multiple table joins. Technology triggers While the team was looking into these issues, they also noticed a trend in the industry: Per core CPU performance was flattening, and the server processor industry was not investing enough in 5 nm fabrication technology to meet the efficiency increases described by Moore’s Law . This is one of the reasons why Amazon built its own processor . If you look at the history of data processing, you’ll see a series of peaks and valleys in what can be defined as “data pressure,” or the ability of a system to process the required amount of data at a reasonable cost and within a reasonable amount of time. When one of these dimensions is broken, it defines a “technology trigger” that signals the need to invent something. At Amazon, we saw that the cost efficiency of the relational database was declining while the TCO of high time-complexity queries was increasing as a result. Something had to change. Relational data platforms only scale well vertically, which means getting a bigger box. Sooner or later, there is no bigger box, and the options to scale an RDBMS-backed system introduce either design complexity or time complexity. Sharding RDBMS systems is hard to self-manage. And, although distributed SQL insulates users from that complexity by providing things like distributed cross commits behind the API to maintain consistency, that insulation also comes at a cost, which can be measured in the time complexity of the queries running across the distributed backend. At the same time, the cost of storage was falling and the promise of denormalized, low time-complexity queries in NoSQL was enticing to say the least. Clearly, it was never going to get any cheaper to operate a relational database; it was only going to get more expensive. Thus, Amazon made the decision to undertake what may be the largest technology migration ever attempted and depreciate RDBMS technology in favor of NoSQL for all Tier 1 services. A new approach to building NoSQL skills Project Rolling Stone launched with great fanfare and had buy-in from all the right stakeholders. But things didn’t go well at first. Amazon’s developers were now using a database designed to operate without the complex queries they had always relied on, and the lack of in-house NoSQL data modeling expertise was crippling the migration effort. The teams lacked the skills needed to design efficient data models, so the early results from prototyped solutions were far worse than anticipated. To correct this situation, leadership created a center of excellence to define best practices and educate the broad Amazon technical organization; the NoSQL Blackbelt Team was formed under my leadership. The challenge before us was enormous. We had limited resources with global scope across an organization of more than 25,000 technical team members. The traditional technical training approach built on workshops, brown bags and hackathons did not deliver the required results because the Amazon organization lacked a core nucleus of NoSQL skills to build on. Additionally, traditional training tends to be sandboxed around canned problems that are often not representative of what the developers are actually working on. As a result, technical team members were completing those exercises without significant insight into how to use NoSQL for their specific use cases. To correct this situation, we reworked the engagement model. Instead of running workshops and hackathons, we used the actual solutions the teams were working on as the learning exercises. The Blackbelt Team executed a series of focused engagements across Amazon development centers, where we delivered technical brown bag sessions to advocate best practices and design patterns. Instead of running canned workshops, however, we scheduled individual design reviews with teams to discuss their specific workloads and prototype a data model they could then iterate on. The result was powerful. Teams gained actionable information they could build on, rather than general knowledge that might or might not be relevant to their use case. During the next three years, Amazon migrated all Tier 1 RDBMS workloads to NoSQL and reduced the infrastructure required to support those services by more than 50%, while still maintaining a high business growth rate. Watch Rick Houlihan’s full MongoDB World 2022 presentation, “From RDBMS to NoSQL at Enterprise Scale.” When to use NoSQL - Looking at Access Patterns When should you use NoSQL? I had to answer this question many times at Amazon, and the answer isn’t so clear-cut. A relational database is agnostic to the access pattern. It doesn’t care what questions you ask. You don’t have to know code, although some people would argue that SQL is code. You can theoretically ask a simple question and get your data. Relational systems do that by being agnostic to every access pattern and by optimizing for none of them. The reality is that the code we write never asks random questions. When you write code, you’re doing it to automate a process — to run the same query a billion times a day, not to run a thousand random queries. Thus, if you understand the access patterns, you can start doing things with the data to create structures that are much easier for systems to retrieve while doing less work. This is the key. The only way to reduce the cost of data processing and the amount of infrastructure deployed is to do less work. OLTP (online transaction processing) applications are really the sweet spot for NoSQL databases. You’ll see the most cost efficiency here because you can create data models that mirror your access patterns and representative data structures that mirror your objects in the application layer. The idea is to deliver a system that is very fast at the high-velocity access patterns that make up the majority of your workload. I talk more about data access patterns and data modeling at a recent Ask Me Anything . Making It All Work There’s a saying that goes, “Data is like garbage. You better know what you are going to do with it before you collect it.” This is where relationships come into play. Nonrelational data, to me, does not exist. I’ve worked with more than a thousand customers and workloads, and I’ve never seen an example of nonrelational data. When I query data, relationships become defined by the conditions of my query. Every piece of data we’re working with has some sort of structure. It has schema, and it has relationships; otherwise, we wouldn’t care about it. No matter what application you’re building, you’re going to need some kind of entity relationship diagram (ERD) that describes your logical data diagram, entities and how they’re related to understand how to model it. Otherwise, you’re just throwing a bunch of bytes in a bucket and randomly selecting things. A relationship always exists between these things. In relational models, they’re typically modeled in third normal form (3NF). For example, in a typical product catalog, you’ll see one-to-one relationships between products and books, products and albums, products and videos, one-to-many relationships between albums and tracks, and many-to-many relationships between videos and actors. This is a pretty simple ERD — we’re not even talking about any complex patterns. But suppose you want to get a list of all your products, you’d have to run three different queries and various levels of joins. That’s a lot of things going on. In a NoSQL database, you’re going to take all those rows and collapse them into objects. If you think about the primary access pattern of this workload, it’s going to be something like, “Get me the product by this ID,” or “Get me all the titles under this category.” Whenever you want the product, you typically want all the data for the product because you’re going to use it in a product window. If you put it all in one document, you no longer have to join those documents or rows. You can just fetch the data by product ID. If you think about what’s happening from a time-complexity perspective, when you have all that data in tables, your one-to-one joins won’t be so bad, but with a one-to-many, the time complexity starts expanding. Again, the examples mentioned here are fairly simple. When you start getting into nested joins, outer and inner, and other more complex SQL statements, you can imagine how much worse the time complexity becomes. That’s your CPU burning away, assembling data across tables. If you’re running a relational data model and you’re joining tables, that’s a problem. Index and conquer Let’s think about how we model those joins in NoSQL. To start, we have a key-value lookup on a product. But we can also create an array of embedded documents called “target” that contains all the things the product is related to, as shown in Figure 1. It contains metadata and anything about the product you need when you query by product ID. Now that we’re using embedded documents, there’s no more time complexity. It’s all an index lookup. It can be one-to-one, one-to-many, many-to-many — it doesn’t matter. As long as the aim is “get the document,” it’s still an index lookup. Figure 1:   Creating an array called “target” eliminates the need to join data from different rows, columns or tables. Of course, a lot more goes into an application than an index lookup. Remember, 70% of our access patterns at Amazon were for a single row of data, and 20% were for a range of rows on a single table. For more complex access patterns, we’re going to need more dimensions. If, for example, we’re running a query for all the books by a given author or all people related to “x,” this will require adding more dimensions, or documents, to the collection. We can create documents for other actors who were in a collection of movies, directors of all the movies, songs from the movies, how these songs relate to other entities in this collection, writers, producers, artists who performed the songs and all the albums those songs appeared on, as shown in Figure 2. Figure 2:   Create more dimensions by adding documents to the collection. Now, if I index the “target” array — which is one of the best things about MongoDB and document databases, multikey arrays — I can create essentially a B-tree lookup structure of those “target” IDs and join all those documents and all of those dimensions, as shown in Figure 3. Now I can select, for example, where target ID is Mary Shelley and get everything she’s related to — the books, people, critiques of her work. Where the target ID is a song title, I can get all the information about that song. Figure 3:   Multikey arrays create what is essentially a B-tree lookup structure, joining all related documents. Essentially, we’re using the index as a join mechanism, which is a critical distinction in NoSQL. At AWS, many teams came to me and told me that NoSQL doesn’t work. The key thing to understand, however, is if you index documents that are stored in the same table or collection on a common dimension that has the same value, you’ve essentially eliminated the need to join that same index across that same value and across multiple tables. That’s what the relational database does. You don’t want to join unindexed columns in a relational database because it will incur a lot of overhead. You want to index those attributes and put pointers to parent objects and child tables and then join on those IDs. With NoSQL, we’re essentially placing all those items in a single table and indexing on the ID. This approach also eliminates the time complexity. If all those documents share a common table, and they’re indexed on a common attribute, the time complexity is 0(log(N)) . Seventy percent of the overhead of handling a request from a database is not getting the data. It’s managing the connection, marshaling the data and moving it back and forth across the TCP/IP stack. So, if I can eliminate one request from a transaction, I’m going to reduce the overhead of that transaction. Conclusion Data that is accessed together should be stored together. That is the mantra that we’ve always espoused at MongoDB. Once we started learning how to use NoSQL at Amazon, we started having better results. We did that through regularly scheduled training sessions where we could teach the fundamentals of NoSQL using our own workloads. That’s what my developer advocacy team at MongoDB does now with customers. We provide templates for how to model data for their workloads to help them do it themselves.

September 26, 2022
Applied

MACH Aligned for Retail: Cloud-Native SaaS

MongoDB is an active member of the MACH Alliance , a non-profit cooperation of technology companies fostering the adoption of composable architecture principles promoting agility and innovation. Each letter in the MACH acronym corresponds to a different concept that should be leveraged when modernizing heritage solutions and creating brand-new experiences. MACH stands for Microservices, API-first, Cloud-native SaaS, and Headless. In previous articles in this series, we explored the importance of Microservices and the API-first approach. Here, we will focus on the third principle championed by the alliance: Cloud-native SaaS. Let’s dive in. What is cloud-native SaaS? Cloud-native SaaS solutions are vendor-managed applications developed in and for the cloud, and leveraging all the capabilities the cloud has to offer, such as fully managed hosting, built-in security, auto-scaling, cross-regional deployment, automatic updates, built-in analytics, and more. Why is cloud-native SaaS important for retail? Retailers are pressed to transform their digital offerings to meet rapidly shifting consumer needs and remain competitive. Traditionally, this means establishing areas of improvement for your systems and instructing your development teams to refactor components to introduce new capabilities (e.g., analytics engines for personalization or mobile app support) or to streamline architectures to make them easier to maintain (e.g., moving from monolith to microservices). These approaches can yield good results but require a substantial investment in time, budget, and internal technical knowledge to implement. Now, retailers have an alternative tool at their disposal: Cloud-native SaaS applications. These solutions are readily available off-the-shelf and require minimal configuration and development effort. Adopting them as part of your technology stack can accelerate the transformation and time to market of new features, while not requiring specific in-house technical expertise. Many cloud-native SaaS solutions focused on retail use cases are available (see Figure 1), including Vue Storefront , which provides a front-end presentation layer for ecommerce, and Amplience , which enables retailers to customize their digital experiences. Figure 1: Some MACH Alliance members providing retail solutions. At the same time, in-house development should not be totally discarded, and you should aim to strike the right balance between the two options based on your objectives. Figure 2 shows pros and cons of the two approaches: Figure 2: Pros and cons of cloud-native SaaS and in-house approaches. MongoDB is a great fit for cloud-native SaaS applications MongoDB’s product suite is cloud-native by design and is a great fit if your organization is adopting this principle, whether you prefer to run your database on-premises, leveraging MongoDB Community and Enterprise Advanced , or as SaaS with MongoDB Atlas . MongoDB Atlas, our developer data platform, is particularly suitable in this context. It supports the three major cloud providers (AWS, GCP, Azure) and leverages the cloud platforms’ features to achieve cloud-native principles and design: Auto-deployment & auto-healing: DB clusters are provisioned, set up, and healed automatically, reducing operational and DBA efforts. Automatically scalable: Built-in auto-scaling capabilities enable the database RAM, CPU, and storage to scale up or down depending on traffic and data volume. A MongoDB Serverless instance allows abstracting the infrastructure even further, by paying only for the resources you need. Globally distributed: The global nature of the retail industry requires data to be efficiently distributed to ensure high availability and compliance with data privacy regulations, such as GDPR , while implementing strict privacy controls. MongoDB Atlas leverages the flexibility of the cloud with its replica set architecture and multi-cloud support, meaning that data can be easily distributed to meet complex requirements Secure from the start: Network isolation, encryption, and granular auditing capabilities ensure data is only accessible to authorized individuals, thereby maintaining confidentiality. Always up to date: Security patches and minor upgrades are performed automatically with no intervention required from your team. Major releases can be integrated effortlessly, without modifying the underlying OS or working with package files. Monitorable and reliable: MongoDB Atlas distributes a set of utilities that provides real-time reporting of database activities to monitor and improve slow queries, visualize data traffic, and more. Backups are also fully managed, ensuring data integrity. Independent Software Vendors (ISVs) increasingly rely on capabilities like these to build cloud-native SaaS applications addressing retail use cases. For example, Commercetools offers a fully managed ecommerce platform underpinned by MongoDB Atlas (see Figure 3). Their end-to-end solution provides retailers with the tools to transform their ecommerce capabilities in a matter of days, instead of building a solution in-house. Commercetools is also a MACH Alliance member, fully embracing composable architecture paradigms explored in this series. Adopting Commercetools as your ecommerce platform of choice lets you automatically scale your ecommerce as traffic increases, and it integrates with many third-party systems, ranging from payment platforms to front-end solutions. Additionally, its headless nature and strong API layer allow your front-end to be adapted based on your brands, currencies, and geographies. Commercetools runs on and natively ingests data from MongoDB. Leveraging MongoDB for your other home-grown applications means that you can standardize your data estate, while taking advantage of the many capabilities that the MongoDB data platform has to offer. The same principles can be applied to other SaaS solutions running on MongoDB. Figure 3: MongoDB Atlas and Commercetools capabilities. Find out more about the MongoDB partnership with Commercetools . Learn how Commercetools enabled Audi to integrate its in-car commerce solution and adapt it to 26 countries . MongoDB supports your home-grown applications MongoDB offers a powerful developer data platform, providing the tools to leverage composable architecture patterns and build differentiating experiences in-house. The same benefits of MongoDB’s cloud-native architecture explored earlier are also applicable in this context and are leveraged by many retailers globally, such as Conrad Electronics, running their B2B ecommerce platform on MongoDB Atlas . Summary Cloud-native principles are an essential component of modern systems and applications. They support ISVs in developing powerful SaaS applications and can be leveraged to build proprietary systems in-house. In both scenarios, MongoDB is strongly positioned to deliver on the cloud-native capabilities that should be expected from a modern data platform. Stay tuned for our final blog of this series on Headless and check out our previous blogs on Microservices and API-first .

September 22, 2022
Applied

How a Data Mesh Facilitates Open Banking

Open banking shows signs of revolutionizing the financial world. In response to pressure from regulators, consumers, or both, banks around the world continue to adopt the central tenet of open banking: Make it easy for consumers to share their financial data with third-party service providers and allow those third parties to initiate transactions. To meet this challenge, banks need to transition from sole owners of financial data and the customer relationship to partners in a new, distributed network of services. Instead of competing with other established banks, they now compete with fintech startups and other non-bank entities for consumer attention and the supply of key services. Despite fundamental shifts in both the competition and the customer relationship, however, open banking offers a huge commercial opportunity, which we’ll look at more closely in this article. After all, banks still hold the most important currency in this changing landscape: trust. Balancing data protection with data sharing Established banks hold a special position in the financial system. Because they are long-standing, heavily regulated, and backed by government agencies that guarantee deposits (e.g., the FDIC in the United States), established banks are trusted by consumers over fintech startups when it comes to making their first forays into open banking. A study by Mastercard of 4,000 U.S. and Canadian consumers found that the majority (55% and 53%, respectively) strongly trusted banks with their financial data. Only 32% of U.S. respondents and 19% of Canadians felt the same way about fintech startups. This position of trust extends to the defensive and risk-averse stance of established banks when it comes to sharing customer data. Even when sharing data internally, these banks have strict, permission-based data access controls and risk-management practices. They also maintain extensive digital audit trails. Open banking challenges these traditional data access practices, however, causing banks to move to a model where end customers are empowered to share their sensitive financial data with a growing number of third parties. Some open banking standards, such as Europe’s Payment Services Directive (PSD2), specifically promote informed consent data sharing, further underlining the shift to consumers as the ultimate stewards of their data. At the same time, banks must comply with evolving global privacy laws, such as Europe’s General Data Protection Regulation (GDPR). These laws add another layer of risk and complexity to data sharing, granting consumers (or “data subjects” in GDPR terms) the right to explicit consent before data is shared, the right to withdraw that consent, data portability rights, and the right to erasure of that data — the famed “right to be forgotten.” In summary, banks are under pressure from regulators and consumers to make data more available, and customers now make the final decision about which third parties will receive that data. Banks are also responsible for managing: Different levels of consent for different types of data The ability to redact certain sensitive fields in a data file, while still sharing the file Compliance with data privacy laws, including "the right to be forgotten" The open opportunity for banks In spite of the competition and added risks for established banks, open banking greatly expands the global market of customers, opens up new business models and services, and creates new ways to grow customer relationships. In an open banking environment, banks can leverage best-of-breed services from third parties to bolster their core banking services and augment their online and mobile banking experiences. Established banks can also create their own branded or “white label” services, like payment platforms, and offer them as services for others to use within the open banking ecosystem. For customers, the ability of third parties to get access to a true 360-degree view of their banking and payment relationships creates new insights that banks would not have been able to generate with just their own data. Given the risks, and the huge potential rewards, how do banks satisfy the push and pull of data sharing and data protection? How do they systematically collect, organize, and publish the most relevant data from across the organization for third parties to consume? Banks need a flexible data architecture that enables the deliberate collection and sharing of customer data both internally and externally, coupled with fine-grained access, traceability, and data privacy controls down to the individual field level. At the same time, this new approach must also provide a speed of development and flexibility that limits the cost of compliance with these new regulations and evolving open banking standards. Rise of the data mesh Open banking requires a fundamental change in a bank’s data infrastructure and its relationship with data. The technology underlying the relational databases and mainframes in use at many established banks was first developed in the 1970s. Conceived long before the cloud computing era, these technologies were never intended to support the demands of open banking, nor the volume, variety, and velocity of data that banks must deal with today. Banks are overcoming these limitations and embracing open banking by remodeling their approach to data and by building a data mesh using a modern developer data platform. What is a data mesh? A data mesh is an architectural framework that helps banks decentralize their approach to sharing and governing data, while also enabling self-service consumption of that data. It achieves this by grouping a bank’s data into domains. Each domain in a data mesh contains related data from across the bank. For example, a "consumer" domain may contain data about accounts, addresses, and relationship managers from across every department of the bank. Each data domain is owned by a different internal stakeholder group or department within the bank, and these owners are responsible for collecting, cleansing, and distributing the data in their domain across the enterprise and to consumers. With open banking, domain owners are also responsible for sharing data to third parties. This decentralized, end-to-end approach to data ownership encourages departments within the bank to adopt a “product-like” mentality toward the data within their domain, ensuring that it is maintained and made available like any other service or product they deliver. For this reason, the term data-as-a-product is synonymous with data mesh. Data domain owners are also expected to: Create and maintain relevant reshaped copies of data, rather than pursue a single-source-of-truth or canonical model. Serve data by exposing data product APIs. This means doing the cleansing and curation of data as close as possible to the source, rather than moving data through complex data pipelines to multiple environments. The successful implementation of a data mesh, and the adoption of a data-as-a-product culture, requires a fundamental understanding of localized data. It also requires proper documentation, design, management, and, most important, flexibility, as in the ability to extend the internal data model. The flexibility of the document model is, therefore, critical for success. Conclusion Open banking holds great potential for the future of the customer experience, and will help established financial institutions meet the ever-evolving customer expectations. Facilitated by a data mesh, you can open new doors for responsible, efficient data sharing across your financial institution, and this increase in data transparency leads to better outcomes for your customers—and your bottom line. Want to learn more about the benefits of open banking? Watch the panel discussion Open Banking: Future-Proof Your Bank in a World of Changing Data and API Standards .

September 22, 2022
Applied

How to Leverage Enriched Queries with MongoDB 6.0

MongoDB introduces useful new functions and features with every release, and MongoDB 6.0, released this summer, offers many notable improvements , including deeper insights from enriched queries via the MongoDB Query API . This set of query enhancements was announced at MongoDB World 2022 by senior product manager Katya Kamenieva. You can watch her presentation below. Watch Kayta Kamenieva’s MongoDB World presentation on queries. Users can now use upgraded operators and change stream features. In this post, we’ll look at several of these updates, along with examples of how you can put them to use. Top N accumulators With this new feature, users can compute top items in each group based on the sort criteria ( $topN , $bottomN ), current order of documents ($firstN, $lastN), or value of a field ( $manX , $minN ). This functionality would be useful, for example, if you have a collection of restaurants with ratings, and you want to see the top three highest-rated restaurants based on the type of cuisine. You can group by cuisine and use $topN to return the top three restaurants by rating. Ability to sort arrays The ability to sort an array allows users to sort elements in the array. For example, suppose you have posted content with hundreds of user comments, and you want to sort the comments based on how many likes they received. In this case, $sortArray can pull those comments and prioritize them to the top of the comments list. Densification and gap-filling These new additions to the aggregation framework help to build out time series data more completely. When attempting to create histograms of data over time, the new stages, $denisfy and $fill , allow you to fill gaps in that data to create smoother and more complete graphs using linear interpolation, last/next observed value carried forward, or a constant value. This capability can be helpful, for example, if you want to create a graph that shows the amount of inventory in a warehouse every day for a year, but the inventory was only recorded once a week. The $densify expression will fill the gaps in the timeline, while $fill will produce values for the inventory data based on the previous observation. Joining sharded collections With this new feature, when joining collections using $lookup or performing recursive search with $graphLookup , collections on both sides can be sharded. Before 6.0, only the originating collection could be sharded. An example use case is enriching records in the “accounts” collection with the list of the corresponding orders that are stored in the “orders” collection. In the past, only “accounts” collections could be sharded. Starting with 6.0, both “accounts” and “orders” collections can be sharded. Change streams pre- and post-images Change streams now offer point-in-time (PIT) pre- and post-image capabilities , allowing users to include the state of the document before and after changes in the output of the change stream. This functionality can be useful in many situations. For example, suppose a company is tracking flight times. If a flight is delayed, the system can compare the value of the departure and arrival times both before and after that delay and trigger an automatic rewrite of the schedule for the new flight timeline, including schedules for the entire crew. Atlas Search across multiple collections This improvement to MongoDB Atlas Search allows users to search across multiple collections with a single query using $search inside the $unionWith or $lookup stages. $search can provide these results quickly, using only one query. Enriched queries are not the only improvements in MongoDB 6.0. Read about the 7 reasons to upgrade to MongoDB 6.0 and discover the possibilities. Try MongoDB Atlas for Free Today

September 20, 2022
Applied

Built With MongoDB: Vanta Automates Security and Compliance for Fast-Growing Businesses

Organizations pay a high price for running afoul of regulations. Several eight- and nine-figure fines have already been issued for GDPR violations in the four years since the far-reaching privacy regulation went into effect. Although the biggest fines are reserved for the biggest offenders, small businesses and startups, which can least afford financial and reputational setbacks, have no choice but to take compliance seriously. San Francisco-based startup Vanta knows what a challenge security and compliance can be for companies. Vanta co-founder Christina Cacioppo worked on Dropbox’s collaborative document project, Paper, when she and her team encountered resistance from the company’s legal team. From legal's perspective, the Paper project was jeopardizing compliance with Dropbox’s customer contracts. Cacioppo helped found Vanta to come up with a software solution to the compliance problem. Vanta helps companies scale security practices and automate compliance for the most prevalent data security and privacy regulatory frameworks, including SOC 2, ISO 27001, HIPAA, PCI DSS, GDPR, and CCPA. The company's platform gives organizations the tools they need to automate up to 90% of the work required for security audits, and more than 1,500 customers have signed on since its founding in 2016. Vanta is part of the MongoDB for Startups program, which helps early-stage, high-growth startups build faster and scale further, and has used MongoDB as its database of record since its inception. Next-level security monitoring Vanta launched in the wake of several high-profile data breaches. Although the company's founders understood that online security was becoming more important, they also knew how hard it could be for fast-growing companies to invest the time and resources needed to build a security foundation. So, they set about building a platform that could withstand not just today's threats but tomorrow's as well. Robbie Ostrow, now engineering manager, was the first employee the company hired. "Historically, the way proving security worked was that a company would have an auditor look at its platform once a year and issue a piece of paper that says, 'you seem secure,'" Ostrow says. "We check all the same items that an auditor would check, but instead of checking 1% of it once a year, we check 100% once an hour." Ostrow acknowledges how helpful MongoDB Atlas has been in ensuring state-of-the-art security practices. "As a security company, one thing that's really important is ensuring that our data is separate from everybody else's data and that we are not accidentally exposing random ports to the internet," Ostrow says. "One awesome thing about MongoDB Atlas is a feature called VPC peering, which allows us to take our virtual private cloud (VPC) and communicate with our database cluster while not exposing any cruft to the world." Integration and scaling According to Ostrow, Vanta’s decision to use MongoDB from the start has been critical to its success. "We originally chose MongoDB because it was a perfect tool with which we could prototype,'' Ostrow says. "But we also found that it's a great tool for production systems. And we don't really believe in MVPs for the sake of MVPs because they eventually end up becoming production systems. So luckily we chose MongoDB, which helped us prototype really quickly because we didn't have to build tooling and migrate it to another system. And then it ended up being a tool that was able to scale with us." Once Vanta moved past an MVP, its growth was intricately tied to how fast it could integrate with other tools and build new features. "The key to the growth we've had is in the number of integrations we've been able to build and new features we've been able to add on top of those integrations," Ostrow says. "MongoDB has helped a lot to allow us to build and ship quickly without any downtime." Vanta software engineer, David Zhu, agrees. "MongoDB makes it easy for us to model our data and access it in ways that are very flexible," Zhu says. "As a security company, we're monitoring a lot of different resources, and our understanding of those resources changes over time." Flexible and familiar As a company that prizes the ability to iterate rapidly, Vanta finds great value in the flexibility of the document model that underpins MongoDB Atlas. "We have a really strict code base," Ostrow says, "but the flexibility of the data model allows us to move quickly while still feeling safe about the changes we're making." Getting the developer experience right is key to maximizing the productivity of a limited and costly resource. "Whenever we make changes or need to think about how we want to model our information," Zhu says, "MongoDB has the flexibility to let us make changes on the fly and speed up our development process." Drew Gregory, a software engineer at Vanta, also highlights the benefit of familiarity when developing in Atlas. "MongoDB's API abstractions tend to feel like JavaScript and JSON objects," Gregory says. "We really enjoy trying to make our entire stack feel and look like TypeScript. So MongoDB, cosmetically, aesthetically, and even programmatically, feels like working with JavaScript the whole way down." Zhu echoed a similar point: "Our technical stack is very straightforward. MongoDB slots right in. All of the data looks similar, and all engineers can work really easily across all aspects of our stack." That familiarity is important at Vanta because it helps with recruiting efforts. "One thing I like to tell people I'm recruiting is that Vanta tries to move fast and not break too many things,'' Ostrow said. "Because we're a startup, we need to grow incredibly quickly. But we're also a security company that our customers depend on. And we want to make sure that, while we're able to ship features really quickly, we're not going to violate customers' trust while we're doing so. Hiring people who are able to do this and ensuring that the tools you're using are able to scale are really important." To that end, Ostrow points out: "We're hiring quickly and looking for great new engineers. So get in touch if you're interested." A program for success MongoDB for Startups offers startups access to a wide range of resources, including free credits to our best-in-class developer data platform, MongoDB Atlas, personalized technical advice, co-marketing opportunities, and access to our robust developer community. Ostrow credits the MongoDB for Startups program for helping Vanta with its Atlas deployment. "MongoDB sent us a consultant who was able to help optimize the way we were using it and gave us a report with excellent advice across the board," Ostrow says. "We still refer to that report all the time." Are you part of a startup and interested in joining the MongoDB for Startups program? Apply now .

September 14, 2022
Applied

How to Use MongoDB Atlas to Make Your CRM More Efficient

As part of digital transformation, many companies want to optimize their internal business processes, gain more visibility into important business metrics, and create new automation routines. Data is always at the core of business processes and metrics, and most business-critical data is often located in one or a few repositories, such as a customer relationship management system (CRM). Historically business users have relied on spreadsheets and enterprise data warehouses for bringing the data together and making decisions. These solutions can range from a disjointed set of dashboards to an all-in-one central console. But businesses that need to move fast need to iterate on their data and processes fast, and they can’t do that if implementing a change in CRM takes months or if the things are done manually in spreadsheets. This article describes how MongoDB Professional Services created an internal solution to address these issues. Our approach In MongoDB Professional Services, we also needed to streamline our business processes and get out of spreadsheets for business management, especially for revenue forecasting. As the organization grew, the amount of manual labor associated with spreadsheet maintenance became untenable, and making sense of the data became more difficult, especially when the data might be inconsistent, stale, or even inaccurate. Ordinarily, a good CRM or Professional Services Automation (PSA) system can help solve this problem. At MongoDB, for example, we use Salesforce, which provides decent flexibility, but also requires heavy customization and has limitations. We’ve also seen MongoDB customers address the problem by building ETL pipelines into MongoDB Atlas and taking advantage of MongoDB’s flexible schema, query language and aggregation framework, and Atlas Search . The data from source systems is ingested as-is or remapped to create a single view. The best approach we’ve found, however, is to optimize the schema for how the data will be consumed, with different parts of documents potentially coming from different source systems. Atlas App Services provides a serverless abstraction layer that allows fine-grained but flexible control over the schema to help you avoid conflicts and iterate without breaking compatibility. After considering alternatives, we created an internal CRM/PSA-augmenting system that is built on top of the MongoDB Atlas platform to provide us with additional capabilities and flexibility. This solution allows Professional Services to rapidly deliver advanced functionality, such as revenue forecasting, automation, and visibility into complex business metrics. The solution also allows Professional Services to address business systems' needs and promptly react to changes, with functionality beyond what is typically provided by other systems. MongoDB’s internal solution, at its core, is serverless and data-centric, leveraging Atlas App Services functions and triggers for processing the data and Atlas Search for full-text search. It uses Connector for BI , Atlas GraphQL API , and App Services wire protocol and Atlas Functions to access and manipulate data from other components. Its components include a React-based console application, Atlas Charts, Tableau dashboards, Google Sheets, and microservices for data import and integrations. Project view of our internal solution console. Revenue forecasting module in our internal solution console. MongoDB Charts shows business metrics. Solution architecture The data architecture in our internal solution builds on the single view approach and the data-mart concept. The main idea is to ingest relevant data from Salesforce and other systems, enrich it, and build on it quickly, as shown in the following image. We followed these eight key principles to help enable this functionality: Focus on bringing in data in the form that makes the most sense for the business. And, find the right balance between making the ETL easy and optimizing for the foreseen application use cases. Apply transformations in the ETL process to make the ingested data intuitive, including document hierarchy, field names, and data types. Clearly define the data lifecycle in terms of data producers and consumers. Data producers can only overwrite documents and fields that they “own” - and only those. For example, the ETL process from the source system should overwrite the data in MongoDB documents as needed, but it should only modify those fields that are actually coming from the pipeline. Aim to structure MongoDB documents in a way that makes it clear which fields are owned by what producer. Atlas App Services schema and rules can help ensure that the most critical documents and fields are correctly accessed and modified. Use the Atlas Functions and App Services wire protocol in applications and services, as opposed to directly connecting to the Atlas instance. This allowed us to use Google SSO in the console without requiring any sophisticated security mechanisms when we need to do regular CRUD operations from within the application. For complex data logic and on-the-fly calculations, use App Functions . Use database triggers for propagating changes and generating data-driven events. Use scheduled triggers for generating aggregated views and periodic work. Use external services for communicating with the outside world (e.g., email sender, ETL job). The external services are invoked asynchronously by listening on change streams from their respective namespaces (pub-sub model). All external services work independently of each other. Don’t overthink. MongoDB Atlas’s Developer Data Platform offers a lot of flexibility and, if these principles are followed, making changes and iterating on a working system is surprisingly easy. To reiterate the last point, our internal solution is easy to modify and extend because of the flexible schema concept in MongoDB and the independence of external components. Users can access the data through available tools and integrations, and developers can update specific parts of the system or introduce new ones without delays, making this solution efficient in terms of both cost and effort. Conclusion Through this example of our internal solution, we demonstrated that by leveraging MongoDB Atlas in full force, you can solve seemingly intractable business problems with speed, efficiency, and robustness on top of what regular systems can do. Whether you’re optimizing your company’s business processes, building business dashboards, or improving automation, the MongoDB Atlas developer data platform can help make the process easier. Learn how MongoDB’s consulting engineers can help you with design and architecture decisions and accelerate your development efforts. Contact us to learn more .

September 12, 2022
Applied

3 Factors Limiting Developers’ Innovation

Software has steadily become the engine of business growth and innovation, which has led to the demand for new applications — for business or consumers — to grow exponentially. According to the International Data Corporation, there will be 750 million new applications built by 2025. That means there will be more applications built over the next few years than were built in the software industry’s first 40 years. With many thousands of new applications rolling out every month, businesses need more developers who can innovate. Indeed, the U.S. Bureau of Labor and Statistics reports that in the United States alone, the workforce will need 400,000 more developers by the end of this decade. But in addition to an increasing number of developers, organizations also need to ensure that their development teams are productive, efficient and able to innovate. A recent MongoDB survey suggests that developers are struggling with that. How do developers spend their time? The goal for developers is to define and build new features and applications. This type of innovation is crucial to business success, since software innovation leads to benefits such as improved customer experience, cost reduction and increased productivity. MongoDB’s 2022 Report on Data and Innovation , a survey of 2,000 Asia-Pacific technology professionals, found that companies share two top goals for innovation: increasing internal efficiency and productivity, and building better products. In other words: building better stuff, faster. But is this happening? The survey says “not really.” Here is a breakdown of how those 2,000 IT professionals reported spending their time: Only 28% of technology teams’ time is spent working on defining and building new features and applications, compared to a whopping 72% of time managing infrastructure and completing administrative tasks and projects. Needless to say, this is not conducive to innovation. What is limiting developer innovation? What is blocking developers from spending more time building new software? The survey points to three top contributors: High developer workloads: One Haystack survey reports that 80% of developers describe themselves as burned out . This obviously can affect an employee’s ability to innovate and create quality work. With the continued growth of data volume and app creation, burnout is only getting worse. This problem can only be addressed by providing developers with the proper tools and a simplified data architecture. Both allow devs to reduce their overall cognitive load and allow them to build applications more efficiently. Complex data architecture: Our survey discovered that complexity limits innovation. Whether a legacy system with decades of organic sprawl or a cloud environment that has become overly complex as more and more components have been added to it, a “spaghetti architecture” requires developers to spend significant time learning, connecting and maintaining disparate technologies. Legacy systems and technical debt: The systems that businesses use, especially outdated technology and overly complex systems, are often major blockers for developers and for an organization’s innovation. Huge amounts of time and resources go into maintenance and in building ways to connect old systems to newer technology. Even as digital transformation efforts move a lot of companies to the cloud, a McKinsey survey found that 60% of CIOs saw their technical debt increase over the previous three years. This means IT decisions made years or decades ago hobble the agility of today’s developers. Want to learn more about developers, data, and innovation? Download MongoDB’s 2022 Data and Innovation Report .

September 8, 2022
Applied

Free your data with the MongoDB Relational Migrator

Nothing is more frustrating than data that is just out of reach. Imagine wanting to combine customer behavior data from your CRM and usage data from your legacy product to trigger tailored promotions in your new mobile app, but not being able to locate the required data in the sea of tables in your relational database. As MongoDB CTO Mark Porter explains in his MongoDB World keynote , the data that can make a difference might be locked up “somewhere that you can’t use.” Relying on his own hard-earned experience with data, Porter adds that this information can be trapped “in a schema with hundreds or thousands of tables that have built up over decades.” “Schema is a huge part of this problem,” MongoDB product manager Tom Hollander explains during a presentation on MongoDB Relational Migrator at MongoDB World 2022. “So we’ve spent a lot of time building out the tools to enable you to map your tabular relational schema into a document schema and make use of the full power of the MongoDB document model.” To see MongoDB Relational Migrator in action, check out this introduction and demo from MongoDB World 2022, featuring MongoDB product manager Tom Hollander. What is MongoDB Relational Migrator? MongoDB Relational Migrator streamlines migrations from legacy data infrastructure to MongoDB by helping developers analyze relational database schemas, convert them into MongoDB schemas, and then migrate data from the source database to MongoDB. Currently, Relational Migrator is compatible with four of the most common relational databases: Oracle, SQL Server, MySQL, and PostgreSQL. Migrator not only moves data from your relational database to MongoDB, but it also transforms it according to your new schema. As Hollander and MongoDB product marketing director Eric Holzhauer point out , developers often use a mix of software and tools (e.g., extract-transform-load pipelines, change data capture (CDC), message queues, and streaming) to execute migrations, which can be complicated, risky, and error-prone. Relational Migrator provides a single tool that can streamline the process while simultaneously ensuring that your data lands in an organized, logical manner. By simplifying schema translations — one of the most complex, difficult parts of any relational migration — Relational Migrator grants developers and other technical teams a greater degree of control over (and increased visibility into) their new MongoDB schema. The result is to make data more accessible for analysis and decision making. “Now I can get at the data in my program without going through a translation layer,” Porter explains. A visual representation of how Migrator maps relational schema to document schema. Migration mode: Snapshot or ongoing? Migrator provides two modes of data transfer: a one-time snapshot or a continuous sync (which will be available later this year). To help decide which mode you should use, consider whether you can move over to MongoDB and immediately decommission your previous database or whether you need to keep your existing relational database up and running. Organizations may wish to keep their relational database for various reasons, such as testing the effectiveness of your proposed document schema, running out the contract or licensing agreement to avoid expensive fees, or keeping old databases available for audits. In this situation, you can keep your relational database running so that Relational Migrator will continue to push data from your source to your new MongoDB clusters. The limits of Relational Migrator As Hollander points out, Relational Migrator is only a tool — one intended to facilitate schema mapping, providing many abilities and options for effective schema design. “It’s not a silver bullet that will immediately modernize your application portfolio,” Hollander says. “It’s not going to do everything for you. You still have to do the planning.” Furthermore, because database schema is a tricky topic even for seasoned experts, Hollander recommends that developers would benefit from working with architects, consultants, and partners — especially if they’re not familiar with MongoDB or schema design best practices. Relational Migrator does not yet support continuous replication, which would enable your relational database and MongoDB clusters to coexist for an extended period of time. However, Hollander says that work on this feature is ongoing and it will be available in the future, along with additional capabilities like schema recommendations, an integration for the MongoDB Atlas developer data platform, and more. MongoDB Relational Migrator is currently in early access, for use on non-production workloads with assistance from our Product and Field Engineering teams. To learn more, get in touch with your MongoDB rep or contact us via our Migrator page to discuss your workload and next steps.

September 6, 2022
Applied

Moving From Monolith to Microservices: Mark Porter and Accenture’s Michael Ljung Explain

The first step in digital transformation for many organizations is to migrate from legacy on-premises environments and move as many workloads as possible into the public cloud. As seen in the first in our series of conversations between Mark Porter, CTO of MongoDB, and Michael Ljung, Accenture ’s Global Lead, Software Engineering, Accenture Cloud First, this is not always easy, but with the right tools and planning, that migration can reap great benefits. The next step in many organizations’ transformation is to dismantle their monolithic applications — which often limit businesses’ ability to quickly innovate — and move to applications built on a microservices architecture. Many organizations are already well on their way. Research shows that 36% of large companies, 50% of medium companies, and 44% of small companies are already using microservices in their production and development. To explain this migration away from the monolith, Porter and Ljung sat down to discuss the benefits of microservices, how to size those services properly for best results, and how an Accenture customer used a microservices approach to quickly roll out new features to help provide COVID-19 vaccinations. Watch their full discussion: Why microservices? Although teams choose a microservice architecture for a variety of reasons and use cases , one driving force is that businesses now rely so heavily on software for competitive advantage that they require a more rapid development cycle for new releases. A monolithic approach does not support the fast time-to-market cycles needed, nor does it provide the working environment developers need to speed the release process. In their conversation, Porter and Ljung cover several benefits of moving away from the monolith and adopting microservices at the proper size, including the following: Microservices align to how humans work best together. A large, monolithic codebase leads to complexity and creates immense cognitive loads for the developers. They offer protection from complete downtime. Microservices allow for compartmentalization to avoid a single point of failure. By contrast, with a monolithic application, if something goes wrong, everything goes wrong. They allow for better application scaling. With a microservices architecture, only the features that require extra performance need to be scaled. And they allow you to increase your speed to market. Some teams have reported that moving to microservices and containers saw a 13x increase in the frequency of software releases . Read the first installment in this cloud migration series, “ Migrating to the Cloud Isn't As Easy As Most People Think .”

September 1, 2022
Applied

3 Reasons (and 2 Ways) to Use MongoDB’s Improved Time Series Collections

Time series data, which reflects measurements taken at regular time intervals, plays a critical role in a wide variety of use cases for a diverse range of industries. For example, park management agencies can use time series data to examine attendance at public parks to better understand peak times and schedule services accordingly. Retail companies, such as Walmart , depend on it to analyze consumer spending patterns down to the minute, to better predict demand and improve shift scheduling, hiring, warehousing, and other logistics. As more sensors and devices are added to networks, time series data and its associated tools have become more important . In this article, we’ll look at three reasons (and two ways) to use MongoDB time series collections in your stack. This in-depth introduction to time series data features MongoDB Product Manager Michael Gargiulo. Reason 1: Purpose-built for the challenges of time series data At first glance, time series collections resemble other collections within MongoDB, with similar functionalities and usage. Beneath the surface, however, they are specifically designed for storing, sorting, and working with time series data. For developers, query speed and data accessibility continue to be challenges associated with time series data. Because of how quickly time series data can accumulate, it must be organized and sorted in a logical way to ensure that queries and their associated operations can run smoothly and quickly. To address this issue, time series collections implement a key tenet of the MongoDB developer data platform: Data that is stored together is accessed together. Documents (the basic building block of MongoDB data) are grouped into buckets, which are organized by time. Each bucket contains time series data from a variety of sources — all of which were gathered from the same time period and all of which are likely to show up on the same queries. For example, if you are using time series collections to analyze the rise in summer temperatures of Valencia, Spain from 1980 to 2020, then one bucket will contain temperatures for August 1991. Relevant, but distinct buckets (such as temperatures for the months of June and July 1991) would also be stored on the same page for faster, easier access. MongoDB also lets you create compound indexes on any measurement field in the bucket (whether it’s timeField or metaField) for faster, more flexible queries. Because of the wide variety of indexing options, operations on time series data can be executed much more quickly than with competing products. For example, scan times are reduced by indexing buckets of documents (each of which has a unique identifier) rather than individual documents. In terms of the previous example, you could create an index on the minimum and maximum average summer temperatures in Valencia, Spain from 1980 to 2020 to more quickly surface necessary data. That way, MongoDB does not have to scan the entire dataset to find min and max values over a period of nearly four decades. Another concern for developers is finding the last metadata value, which in other solutions, requires users to scan the entire data set — a time-consuming process. Instead, time series collections use last point queries, where MongoDB simply retrieves the last measurement for each metadata value. As with other fields, users can also create indexes for last points in their data. In our example, you could create an index to identify the end of summer temperatures in Valencia from 1980 to 2020. By indexing the last values, time series collections can drastically reduce query times. Another recurring challenge for time series applications is data loss from Internet of Things (IoT) applications for industries such as manufacturing, meteorology, and more. As sensors go offline and gaps in your data appear, it becomes much more difficult to run analytics, which require a continuous, uninterrupted flow of data. As a solution, the MongoDB team created densification and gap filling. Densification, executed by the $densify command, creates blank, placeholder documents to fill in any missing timestamps. Users can then sort data by time and run the $fill command for gap filling. This process will estimate and add in any null or missing values in documents based on existing data. By using these two capabilities in tandem, you will get a steady flow of data to input into aggregation pipelines for insights. Reason 2: Keep everything in house, in one data platform Juggling different data tools and platforms can be exhausting. Cramming a bunch of separate products and technologies into a single infrastructure can create complex architectures and require significant operational overhead. Additionally, a third-party time series solution may not be compatible with your existing workflows and may necessitate more workarounds just to keep things running smoothly. The MongoDB developer data platform brings together several products and features into a single, intuitive ecosystem, so developers can use MongoDB to address many common needs — from time series data to change streams — while reducing time-consuming maintenance and overhead. As a result, users can take advantage of the full range of MongoDB features to collect, analyze, and transform time series data. You can query time series collections through the MongoDB Compass GUI or the MongoDB Shell , utilize familiar MongoDB capabilities like nesting data within documents, secondary indexes, and operators like $lookup or $merge, and process time series data through aggregation pipelines to extract insights and inform decision making. Reason 3: Logical ways to organize and access time series data Time series collections are designed to be efficient, effective, and easy to use. For example, these collections utilize a columnar storage format that is optimized for time series data. This approach ensures efficiency in all database operations, including queries, input/output, WiredTiger cache usage, and storage footprints for both data and secondary indexes. Let’s look, for example, at how querying time series data collections works. When a query is executed, two things happen behind the scenes: Bucket unpacking and query rewrites. To begin with, time series collections will automatically unpack buckets — similar to the $unwind command. MongoDB will unscroll compressed data, sort it, and return it to the format in which it was inserted, so that it is easier for users to read and parse. Query rewrites work alongside bucket unpacking to ensure efficiency. To avoid unpacking too many documents (which exacts a toll in time and resource usage), query rewrites use indexes on fields such as timestamps to automatically eliminate buckets that fall outside the desired range. For example, if you are searching for average winter temperatures in Valencia, Spain from 1980 to 2020, you can exclude all temperatures from the spring, summer, and fall months. Now that we’ve examined several reasons to consider MongoDB time series collections, we’ll look at two specific use cases. Use case 1: Algorithmic trading Algorithmic trading is a major use case for time series data, and this market is predicted to grow to $15 billion by 2028 . The strength of algorithms lies in their speed and automation; they reduce the possibility of mistakes stemming from human emotions or reaction time and allow for trading frequency beyond what a human can manage. Trading algorithms also generate vast volumes of time series data, which cannot necessarily be deleted, due to compliance and forecasting needs. MongoDB, however, lets you set archival parameters, to automatically move that data into cheaper cloud object storage after a preset interval of time. This approach preserves your valuable storage space for more recent data. Using MongoDB products such as Atlas, materialized views, time series collections, and triggers, it is also possible to build a basic trading algorithm. Basically, time series data will be fed into this algorithm, and when the conditions are ideal, the algorithm can buy or sell as needed, thus executing a series of individual trades with cumulative profits and losses (P&L). Although you’ll need a Java app to actually execute the trades, MongoDB can provide a strong foundation on which to build. The structure of such an algorithm is simple. Time series data is loaded from a live feed into MongoDB Atlas, which will then input it into a materialized view to calculate the averages that will serve as the basis of your trades. You can also add a scheduled trigger to execute when new data arrives, thereby refreshing your materialized views, keeping your algorithm up to date, and not losing out on any buying/selling opportunities. To learn more, watch Wojciech Witoszynski’s MongoDB World 2022 presentation on building a simple trading algorithm using MongoDB Atlas, “Algorithmic Trading Made Easy.” Use case 2: IoT Due to the nature of IoT data, such as frequent sensor readings at fixed times throughout a day, IoT applications are ideally suited for time series collections. For example, Confluent, a leading streaming data provider, uses its platform alongside MongoDB Atlas Device Sync , mobile development services, time series collections, and triggers to gather, organize, and analyze IoT data from edge devices. IoT apps often feature high volumes of data taken over time from a wide range of physical sensors, which makes it easy to fill in meta fields and take advantage of densification and gap filling features as described above. MongoDB’s developer data platform also addresses many of the challenges associated with IoT use cases. To begin with, MongoDB is highly scalable, which is an important advantage, given the huge volumes of data generated by IoT devices. Furthermore, MongoDB includes key features to enable you to make the most of your IoT data in real time. These include change streams for identifying database events as they occur, or functions, which can be pre-scheduled or configured to execute instantaneously to respond to database changes and other events. For users dealing with time-based data, real-time or otherwise, MongoDB’s time series collections offer a seamless, highly optimized way to accelerate operations, remove friction, and use tools, such as triggers, to further analyze and extract value from their data. Additionally, users no longer have to manually bucket, query, or otherwise troubleshoot time series data; instead, MongoDB does all that work for them. Try MongoDB time series collections for free in MongoDB Atlas .

August 30, 2022
Applied

Ready to get Started with MongoDB Atlas?

Start Free