Migrate Your Mindset to the Cloud Along with Your Data — A Conversation with Mark Porter and Accenture’s Michael Ljung
The challenges of getting data and applications into the cloud are well-known. Technology isn’t always the hardest part of cloud migration, however, and it won’t produce digital transformation on its own. In many cloud migrations, both people and processes need to change along with the technology. That’s because the processes that work in a legacy environment won’t necessarily help an organization thrive in the cloud. Instead, the opposite can happen: Legacy procedures tend to produce legacy results, making it difficult to achieve the impact that so many organizations seek from cloud-based digital transformation. As part of an ongoing series on cloud migration and digital transformation, MongoDB CTO Mark Porter sat down with Michael Ljung, Accenture’s Global Engineering Lead, to examine new approaches and new ways of thinking that can be crucial to success in the cloud. Experience and perspective During their conversation, both Porter and Ljung recounted situations during which they, and their organizations, were called upon to partner in a new way with clients that were struggling to migrate to the cloud. Each knew that their experience and perspective could lead to success for these organizations. They also knew that their message — that sometimes it’s necessary to go slow to go fast — might not find receptive ears. When organizations bring their old procedures and old deployment technologies to the cloud, “They’re in two places for a little while,” Porter says. Their data and applications may be in the cloud, but their mindset is on premises. Porter says that the MongoDB team helped a large cryptocurrency exchange through this exact situation. MongoDB helped the exchange get through the learning curve associated with new technologies, acting as an embedded member of the team and even guiding them in setting quarterly goals for their migrations. Ljung described a large government client that wanted to move to the cloud and do it quickly. The organization embraced agile methodologies but didn’t have the automation or the experience with CI/CD to support cloud development. They were releasing new code to production almost daily, but a fix in one place could easily cause a breaking change elsewhere, and often did. Digital done right The solution was to take a step back. Accenture started by supporting the organization in mastering incremental delivery. Next came some basic automation. With that in place, the organization was able to return to agile methodologies and organize themselves into sprints. Now, Ljung says, “This client is an example of digital transformation done right” — all because, as he and Porter agreed, they were willing to go slow to go fast. Watch the full video series with Mark Porter and Michael Ljung to learn more about the strategies that support successful cloud migration and digital transformation.
4 Ways to Create a Zero Trust Environment in Financial Services
For years, security professionals protected their IT much like medieval guards protected a walled city — they made it as difficult as possible to get inside. Once someone was past the perimeter, however, they had generous access to the riches within. In the financial sector, this would mean access to personal identifiable information (PII), including a “marketable data set” of credit card numbers, names, social security information, and more. Sadly, such breaches occurred in many cases, adversely affecting end users. A famous example is the Equifax incident, where a small breach led to years of unhappy customers. Since then, the security mindset has changed as users increasingly access networks and applications from any location, on any device, on platforms hosted in the cloud — the classic point-to-point security approach is obsolete. The perimeter has changed, so reliance on it as a protective barrier has changed as well. Given the huge amount of confidential client and customer data that the financial services industry deals with on a daily basis — and the strict regulations — security needs to be an even higher priority. The perceived value of this data also makes financial services organizations a primary target for data breaches. In this article, we’ll examine a different approach to security, called zero trust , that can better protect your assets. Paradigm shift Zero trust presents a new paradigm for cybersecurity. In a zero trust environment, the perimeter is assumed to have been breached; there are no trusted users, and no user or device gains trust simply because of its physical or network location. Every user, device, and connection must be continually verified and audited. Here are four concepts to know about creating a zero trust environment. 1. Securing the data Although ensuring access to banking apps and online services is vital, the database, which is the backend of these applications, is a key part of creating a zero trust environment. The database contains much of an organization’s sensitive, and regulated, information, along with data that may not be sensitive but is critical to keeping the organization running. Thus, it is imperative that a database be ready and able to work in a zero trust environment. As more databases are becoming cloud-based services, an important aspect is ensuring that the database is secure by default—meaning it is secure out of the box. This approach takes some of the responsibility for security out of the hands of administrators, because the highest levels of security are in place from the start, without requiring attention from users or administrators. To allow access, users and administrators must proactively make changes— nothing is automatically granted. As more financial institutions embrace the cloud, securing data can get more complicated. Security responsibilities are divided between the clients’ own organization, the cloud providers, and the vendors of the cloud services being used. This approach is known as the shared responsibility model. It moves away from the classic model where IT owns hardening of the servers and security and then needs to harden the software on top—for example, the version of the database software—and then harden the actual application code. In this model, the hardware (CPU, network, storage) are solely in the realm of the cloud provider that provisions these systems. The service provider for a Data-as-a-Service model then delivers the database hardened to the client with a designated endpoint. Only then does the actual client team and their application developers and DevOps team come into play for the actual solution. Security and resilience in the cloud are only possible when everyone is clear on their roles and responsibilities. Shared responsibility recognizes that cloud vendors ensure that their products are secure by default, while still available, but also that organizations take appropriate steps to continue to protect the data they keep in the cloud. 2. Authentication for customers and users In banks and finance organizations, there is a lot of focus on customer authentication, or making sure that accessing funds is as secure as possible. It’s also important, however, to ensure secure access to the database on the other end. An IT organization can use various methods to allow users to authenticate themselves to a database. Most often, the process includes a username and password. But, given the increased need to maintain the privacy of confidential customer information by financial services organizations, this step should only be viewed as a base layer. At the database layer, it is important to have transport layer security and SCRAM authentication , which enables traffic from clients to the database to be authenticated and encrypted in transit. Passwordless authentication should also be considered—not just for customers, but for internal teams as well. This can be done in multiple ways with the database, for example, auto-generated certificates may be required to access the database. Advanced options exist for organizations already using X.509 certificates that have a certificate management infrastructure. 3. Logging and auditing In the highly regulated financial industry, it is also important to monitor your zero trust environment to ensure that it remains in force and encompasses your database. The database should be able to log all actions or have functionality to apply filters to capture only specific events, users, or roles. Role-based auditing lets you log and report activities by specific roles, such as userAdmin or dbAdmin, coupled with any roles inherited by each user, rather than having to extract activity for each individual administrator. This approach makes it easier for organizations to enforce end-to-end operational control and maintain the insight necessary for compliance and reporting. 4. Encryption With large amounts of valuable data, financial institutions also need to make sure that they are embracing encryption —in flight, at rest, and even in use. Securing data with client-side, field-level encryption allows you to move to managed services in the cloud with greater confidence. The database only works with encrypted fields and organizations control their own encryption keys, rather than having the database provider manage them. This additional layer of security enforces an even more fine-grained separation of duties between those who use the database and those who administer and manage it. Also, as more data is being transmitted and stored in the cloud—some of which are highly sensitive workloads—additional technical options to control and limit access to confidential and regulated data is needed. However, this data still needs to be used. So, ensuring that in-use data encryption is part of your zero trust solution is vital. This approach enables organizations to confidently store sensitive data, meeting compliance requirements while also enabling different parts of the business to gain access and insights from it. Conclusion In a world where security of data is only becoming more important, financial services organizations rank among those with the most to lose if data gets into the wrong hands. Ditching the perimeter mentality and moving toward zero trust—especially as more cloud and as-a-service offerings are embedded in infrastructure—is the only way to truly protect such valuable assets. Learn more about developing a strategic advantage in financial services. Read the ebook now .
Women’s Advocacy Summit Recap: The Value of Inclusive Cultures
It’s July 26, 2022, and Sandhya Parameshwara, Managing Director, Accenture, opens the Women’s Advocacy Summit with a stark wake-up call: There are clear disconnects between business leaders’ perceptions of the importance of workplace culture and inclusivity and those of their employees and the wider public, especially millennials. Many leaders see culture as difficult to measure and link to business performance. Consequently, other issues often take a higher priority. Parameshwara, however, points to research that suggests that businesses with a strong focus on culture and equality also have staff, particularly women, who are more likely to reach senior positions and benefit from growth through innovation. Ahead of this curve are the people Parameshwara describes as “culture makers”—those who recognize the importance of an inclusive culture and reward those who strive to achieve it. “Culture makers are the people who say, who do, and then who drive,” she explains. “They are self-aware. They are relevant in the marketplace. They recognize and see the importance of the culture. They promote and advocate progress.” This notion set the tone for the rest of the Women’s Advocacy Summit, an event hosted in collaboration with the MongoDB Women’s Group, AT&T’s Women’s Group, and Women in Samsung Electronics. Two hundred women tech leaders and their allies came together to discuss the inequality that women continue to face in the workplace, how companies will forge ahead to accelerate their organizations’ equality, and how they’ll work to retain and cultivate their female talent. The power of courage Anne Chow, who recently retired as CEO at AT&T Business, is a clear example of a culture maker. Chatting with MongoDB CEO Dev Ittycheria, Chow discusses the value of positive change and shifting corporate dynamics. “There's no question that the future and our present require leaders to become truly inclusive,” she says. “It’s an evolving art and an evolving science.” Chow also believes there has been an evolution in corporate structures. “The power is flipped. It’s now in the hands of employees,” she explains. “One of the key things about being an inclusive leader is we need to meet people and align with where they want to be and where they want to go.” For Chow, positive change is “so desperately needed, across our businesses, across society, across the community,” and driving inclusivity requires a particular set of skills and attitudes. “Courage, especially moral courage, is one of the most foundational characteristics of great leadership,” Chow says. “You also need the realization that mistakes are simply part of the journey.” Ittycheria recalls an adage he gives his children: “Success is not the absence of problems, it's the ability to deal with them.” He adds, “Hope is not a strategy; you have to take a proactive approach. You have to find a way to navigate the difficult issues.” One of the difficult issues that women—especially if they’re parents—often struggle with is work-life balance, although this is a concept that Chow challenges. “One of my famous sayings is, ‘Balance is bogus.’ Why? You have one life that has personal characteristics and professional characteristics, and you are leading that one life.” Chow prefers to view life as an “optimization equation” in which you can have it all, just not necessarily at the same time. She also says that leaders must recognize that attitudes will vary. “What are you trying to optimize to? There is no answer that Dev or I or anybody could give you that's going to inform you what the right choice is for you.” Pay it forward A panel discussion brings a wider perspective as Asya Kamsky, a principal engineer at MongoDB, invites four women leaders to share their views. Key themes include the importance of support networks, juggling the responsibilities of work and parenting, and the obligation to mentor women as they build their careers. Having grown up in India and Africa, Anjali Nair, Microsoft’s VP of Azure Operators, is familiar with cultural biases in technology. And while things have changed in the past few decades, she still believes there is a long way to go before the balance of representation is fully redressed. “It's really about women uplifting and sponsoring each other,” she says. “I want to make sure I'm doing my part. I've been involved in grassroot initiatives where we get women involved in STEM at high schools and colleges. This is going to be a continuous process.” Success strategies for women have also evolved from simply being “more like the men,” says Leigh Nager, Vice President of mobile and networks commercial law at Samsung. “We're starting to understand that women bring characteristics to the table that are good for business,” she adds. “But how did we get that recognition? We had to get representation in the first place.” Many of these themes resonate with AT&T’s Vice President of eCommerce, Maryanne Cheung, who says that while being a woman in a largely male-led industry was once a “badge of honor” for her, the value of having a peer support group became critical, especially when she had concerns about starting a family. “I had a network I could reach out to and get advice from,” she recalls. “It’s important to recognize where we can show women more of our authentic selves at all stages of our lives. It's something I'm really passionate about.” Tara Hernandez, engineering VP at MongoDB, acknowledges support she has received, and that she in turn has her own duty and obligation to “pay that forward.” She also echoes Nager’s view that there is a strong commercial argument for fostering an inclusive culture. “It's not just about growing women in tech,” she concludes. “It's about recognizing that all of us bring something valuable that will lead to innovation, growth, and business success that are all ultimately in our best interests.” There’s still time to register for the next MongoDB Women’s Group event. Register to attend “Forging your Path as a Woman in Tech” on October 13 12:30pm - 1:30pm, 3:30pm - 4:30pm EDT. Interested in pursuing a career at MongoDB? We have several open roles on our teams across the globe, and we’d love for you to build your career with us.
Relational to NoSQL at Enterprise Scale: Lessons from Amazon
When most people think about Amazon, they think of the cloud. But the company was founded more than a decade before anyone was talking about the cloud. In fact, by 2002, when Amazon founder Jeff Bezos wrote a now-famous internal email directing all new software development to be designed around service-oriented architecture, Amazon was already a $5 billion enterprise. In 2017, Amazon was generating more than 50 times that annual revenue, and like many enterprise organizations, the core of that revenue was driven by the monolithic services that formed the backbone of the business. Those monoliths didn’t go away overnight, and in 2017 and 2018, Amazon kicked off a massive RDBMS-to-NoSQL migration project called “Rolling Stone” to move about 10,000 RDBMS-backed microservices as well as decompose the remaining monoliths into microservices backed by NoSQL. Amazon chose to use its own NoSQL database, but the lessons from that huge effort are valuable for any migration to a NoSQL or document database. In this article, I’ll share some of the insights gained about when and how to use NoSQL. RDBMS costs At the time of this migration, I ran the NoSQL Blackbelt Team for Amazon’s retail business, which was the center of excellence for the business and which developed most of the design patterns and best practices that Amazon uses to build NoSQL-backed application services today. In 2017, Amazon had more than 3,000 Oracle server instances, 10,000 application services and 25,000 global developers, and almost the entire development team was well versed in relational database technology. The cost of the IT infrastructure driving the business, however, was spiraling out of control. As the team started to look for root causes, they quickly realized that the cost of the relational database management system (RDBMS) was a big line item. The infrastructure required to support RDBMS workloads was enormous and did not scale well to meet the needs of the company’s high-demand services. Amazon had the biggest Oracle license and the largest RAC deployments in the world, and the cost and complexity of scaling services built on RDBMS was negatively affecting the business. As a result, we started looking at what we were actually doing in these databases. A couple of interesting things came out. We found that 70% of the access patterns that we were running against the data involved a single row of data on a single table. Another 20% were on a range of rows on a single table. So, we weren’t running complex queries against the data at high velocity. In fact, the vast majority were just inserts and updates, but many of those were executed “transactionally” across disparate systems using two-phase commits to ensure data consistency. Additionally, the cost was very high for the other 10% of the access patterns because most were complex queries requiring multiple table joins. Technology triggers While the team was looking into these issues, they also noticed a trend in the industry: Per core CPU performance was flattening, and the server processor industry was not investing enough in 5 nm fabrication technology to meet the efficiency increases described by Moore’s Law . This is one of the reasons why Amazon built its own processor . If you look at the history of data processing, you’ll see a series of peaks and valleys in what can be defined as “data pressure,” or the ability of a system to process the required amount of data at a reasonable cost and within a reasonable amount of time. When one of these dimensions is broken, it defines a “technology trigger” that signals the need to invent something. At Amazon, we saw that the cost efficiency of the relational database was declining while the TCO of high time-complexity queries was increasing as a result. Something had to change. Relational data platforms only scale well vertically, which means getting a bigger box. Sooner or later, there is no bigger box, and the options to scale an RDBMS-backed system introduce either design complexity or time complexity. Sharding RDBMS systems is hard to self-manage. And, although distributed SQL insulates users from that complexity by providing things like distributed cross commits behind the API to maintain consistency, that insulation also comes at a cost, which can be measured in the time complexity of the queries running across the distributed backend. At the same time, the cost of storage was falling and the promise of denormalized, low time-complexity queries in NoSQL was enticing to say the least. Clearly, it was never going to get any cheaper to operate a relational database; it was only going to get more expensive. Thus, Amazon made the decision to undertake what may be the largest technology migration ever attempted and depreciate RDBMS technology in favor of NoSQL for all Tier 1 services. A new approach to building NoSQL skills Project Rolling Stone launched with great fanfare and had buy-in from all the right stakeholders. But things didn’t go well at first. Amazon’s developers were now using a database designed to operate without the complex queries they had always relied on, and the lack of in-house NoSQL data modeling expertise was crippling the migration effort. The teams lacked the skills needed to design efficient data models, so the early results from prototyped solutions were far worse than anticipated. To correct this situation, leadership created a center of excellence to define best practices and educate the broad Amazon technical organization; the NoSQL Blackbelt Team was formed under my leadership. The challenge before us was enormous. We had limited resources with global scope across an organization of more than 25,000 technical team members. The traditional technical training approach built on workshops, brown bags and hackathons did not deliver the required results because the Amazon organization lacked a core nucleus of NoSQL skills to build on. Additionally, traditional training tends to be sandboxed around canned problems that are often not representative of what the developers are actually working on. As a result, technical team members were completing those exercises without significant insight into how to use NoSQL for their specific use cases. To correct this situation, we reworked the engagement model. Instead of running workshops and hackathons, we used the actual solutions the teams were working on as the learning exercises. The Blackbelt Team executed a series of focused engagements across Amazon development centers, where we delivered technical brown bag sessions to advocate best practices and design patterns. Instead of running canned workshops, however, we scheduled individual design reviews with teams to discuss their specific workloads and prototype a data model they could then iterate on. The result was powerful. Teams gained actionable information they could build on, rather than general knowledge that might or might not be relevant to their use case. During the next three years, Amazon migrated all Tier 1 RDBMS workloads to NoSQL and reduced the infrastructure required to support those services by more than 50%, while still maintaining a high business growth rate. Watch Rick Houlihan’s full MongoDB World 2022 presentation, “From RDBMS to NoSQL at Enterprise Scale.” When to use NoSQL - Looking at Access Patterns When should you use NoSQL? I had to answer this question many times at Amazon, and the answer isn’t so clear-cut. A relational database is agnostic to the access pattern. It doesn’t care what questions you ask. You don’t have to know code, although some people would argue that SQL is code. You can theoretically ask a simple question and get your data. Relational systems do that by being agnostic to every access pattern and by optimizing for none of them. The reality is that the code we write never asks random questions. When you write code, you’re doing it to automate a process — to run the same query a billion times a day, not to run a thousand random queries. Thus, if you understand the access patterns, you can start doing things with the data to create structures that are much easier for systems to retrieve while doing less work. This is the key. The only way to reduce the cost of data processing and the amount of infrastructure deployed is to do less work. OLTP (online transaction processing) applications are really the sweet spot for NoSQL databases. You’ll see the most cost efficiency here because you can create data models that mirror your access patterns and representative data structures that mirror your objects in the application layer. The idea is to deliver a system that is very fast at the high-velocity access patterns that make up the majority of your workload. I talk more about data access patterns and data modeling at a recent Ask Me Anything . Making It All Work There’s a saying that goes, “Data is like garbage. You better know what you are going to do with it before you collect it.” This is where relationships come into play. Nonrelational data, to me, does not exist. I’ve worked with more than a thousand customers and workloads, and I’ve never seen an example of nonrelational data. When I query data, relationships become defined by the conditions of my query. Every piece of data we’re working with has some sort of structure. It has schema, and it has relationships; otherwise, we wouldn’t care about it. No matter what application you’re building, you’re going to need some kind of entity relationship diagram (ERD) that describes your logical data diagram, entities and how they’re related to understand how to model it. Otherwise, you’re just throwing a bunch of bytes in a bucket and randomly selecting things. A relationship always exists between these things. In relational models, they’re typically modeled in third normal form (3NF). For example, in a typical product catalog, you’ll see one-to-one relationships between products and books, products and albums, products and videos, one-to-many relationships between albums and tracks, and many-to-many relationships between videos and actors. This is a pretty simple ERD — we’re not even talking about any complex patterns. But suppose you want to get a list of all your products, you’d have to run three different queries and various levels of joins. That’s a lot of things going on. In a NoSQL database, you’re going to take all those rows and collapse them into objects. If you think about the primary access pattern of this workload, it’s going to be something like, “Get me the product by this ID,” or “Get me all the titles under this category.” Whenever you want the product, you typically want all the data for the product because you’re going to use it in a product window. If you put it all in one document, you no longer have to join those documents or rows. You can just fetch the data by product ID. If you think about what’s happening from a time-complexity perspective, when you have all that data in tables, your one-to-one joins won’t be so bad, but with a one-to-many, the time complexity starts expanding. Again, the examples mentioned here are fairly simple. When you start getting into nested joins, outer and inner, and other more complex SQL statements, you can imagine how much worse the time complexity becomes. That’s your CPU burning away, assembling data across tables. If you’re running a relational data model and you’re joining tables, that’s a problem. Index and conquer Let’s think about how we model those joins in NoSQL. To start, we have a key-value lookup on a product. But we can also create an array of embedded documents called “target” that contains all the things the product is related to, as shown in Figure 1. It contains metadata and anything about the product you need when you query by product ID. Now that we’re using embedded documents, there’s no more time complexity. It’s all an index lookup. It can be one-to-one, one-to-many, many-to-many — it doesn’t matter. As long as the aim is “get the document,” it’s still an index lookup. Figure 1: Creating an array called “target” eliminates the need to join data from different rows, columns or tables. Of course, a lot more goes into an application than an index lookup. Remember, 70% of our access patterns at Amazon were for a single row of data, and 20% were for a range of rows on a single table. For more complex access patterns, we’re going to need more dimensions. If, for example, we’re running a query for all the books by a given author or all people related to “x,” this will require adding more dimensions, or documents, to the collection. We can create documents for other actors who were in a collection of movies, directors of all the movies, songs from the movies, how these songs relate to other entities in this collection, writers, producers, artists who performed the songs and all the albums those songs appeared on, as shown in Figure 2. Figure 2: Create more dimensions by adding documents to the collection. Now, if I index the “target” array — which is one of the best things about MongoDB and document databases, multikey arrays — I can create essentially a B-tree lookup structure of those “target” IDs and join all those documents and all of those dimensions, as shown in Figure 3. Now I can select, for example, where target ID is Mary Shelley and get everything she’s related to — the books, people, critiques of her work. Where the target ID is a song title, I can get all the information about that song. Figure 3: Multikey arrays create what is essentially a B-tree lookup structure, joining all related documents. Essentially, we’re using the index as a join mechanism, which is a critical distinction in NoSQL. At AWS, many teams came to me and told me that NoSQL doesn’t work. The key thing to understand, however, is if you index documents that are stored in the same table or collection on a common dimension that has the same value, you’ve essentially eliminated the need to join that same index across that same value and across multiple tables. That’s what the relational database does. You don’t want to join unindexed columns in a relational database because it will incur a lot of overhead. You want to index those attributes and put pointers to parent objects and child tables and then join on those IDs. With NoSQL, we’re essentially placing all those items in a single table and indexing on the ID. This approach also eliminates the time complexity. If all those documents share a common table, and they’re indexed on a common attribute, the time complexity is 0(log(N)) . Seventy percent of the overhead of handling a request from a database is not getting the data. It’s managing the connection, marshaling the data and moving it back and forth across the TCP/IP stack. So, if I can eliminate one request from a transaction, I’m going to reduce the overhead of that transaction. Conclusion Data that is accessed together should be stored together. That is the mantra that we’ve always espoused at MongoDB. Once we started learning how to use NoSQL at Amazon, we started having better results. We did that through regularly scheduled training sessions where we could teach the fundamentals of NoSQL using our own workloads. That’s what my developer advocacy team at MongoDB does now with customers. We provide templates for how to model data for their workloads to help them do it themselves.
MACH Aligned for Retail: Cloud-Native SaaS
MongoDB is an active member of the MACH Alliance , a non-profit cooperation of technology companies fostering the adoption of composable architecture principles promoting agility and innovation. Each letter in the MACH acronym corresponds to a different concept that should be leveraged when modernizing heritage solutions and creating brand-new experiences. MACH stands for Microservices, API-first, Cloud-native SaaS, and Headless. In previous articles in this series, we explored the importance of Microservices and the API-first approach. Here, we will focus on the third principle championed by the alliance: Cloud-native SaaS. Let’s dive in. What is cloud-native SaaS? Cloud-native SaaS solutions are vendor-managed applications developed in and for the cloud, and leveraging all the capabilities the cloud has to offer, such as fully managed hosting, built-in security, auto-scaling, cross-regional deployment, automatic updates, built-in analytics, and more. Why is cloud-native SaaS important for retail? Retailers are pressed to transform their digital offerings to meet rapidly shifting consumer needs and remain competitive. Traditionally, this means establishing areas of improvement for your systems and instructing your development teams to refactor components to introduce new capabilities (e.g., analytics engines for personalization or mobile app support) or to streamline architectures to make them easier to maintain (e.g., moving from monolith to microservices). These approaches can yield good results but require a substantial investment in time, budget, and internal technical knowledge to implement. Now, retailers have an alternative tool at their disposal: Cloud-native SaaS applications. These solutions are readily available off-the-shelf and require minimal configuration and development effort. Adopting them as part of your technology stack can accelerate the transformation and time to market of new features, while not requiring specific in-house technical expertise. Many cloud-native SaaS solutions focused on retail use cases are available (see Figure 1), including Vue Storefront , which provides a front-end presentation layer for ecommerce, and Amplience , which enables retailers to customize their digital experiences. Figure 1: Some MACH Alliance members providing retail solutions. At the same time, in-house development should not be totally discarded, and you should aim to strike the right balance between the two options based on your objectives. Figure 2 shows pros and cons of the two approaches: Figure 2: Pros and cons of cloud-native SaaS and in-house approaches. MongoDB is a great fit for cloud-native SaaS applications MongoDB’s product suite is cloud-native by design and is a great fit if your organization is adopting this principle, whether you prefer to run your database on-premises, leveraging MongoDB Community and Enterprise Advanced , or as SaaS with MongoDB Atlas . MongoDB Atlas, our developer data platform, is particularly suitable in this context. It supports the three major cloud providers (AWS, GCP, Azure) and leverages the cloud platforms’ features to achieve cloud-native principles and design: Auto-deployment & auto-healing: DB clusters are provisioned, set up, and healed automatically, reducing operational and DBA efforts. Automatically scalable: Built-in auto-scaling capabilities enable the database RAM, CPU, and storage to scale up or down depending on traffic and data volume. A MongoDB Serverless instance allows abstracting the infrastructure even further, by paying only for the resources you need. Globally distributed: The global nature of the retail industry requires data to be efficiently distributed to ensure high availability and compliance with data privacy regulations, such as GDPR , while implementing strict privacy controls. MongoDB Atlas leverages the flexibility of the cloud with its replica set architecture and multi-cloud support, meaning that data can be easily distributed to meet complex requirements Secure from the start: Network isolation, encryption, and granular auditing capabilities ensure data is only accessible to authorized individuals, thereby maintaining confidentiality. Always up to date: Security patches and minor upgrades are performed automatically with no intervention required from your team. Major releases can be integrated effortlessly, without modifying the underlying OS or working with package files. Monitorable and reliable: MongoDB Atlas distributes a set of utilities that provides real-time reporting of database activities to monitor and improve slow queries, visualize data traffic, and more. Backups are also fully managed, ensuring data integrity. Independent Software Vendors (ISVs) increasingly rely on capabilities like these to build cloud-native SaaS applications addressing retail use cases. For example, Commercetools offers a fully managed ecommerce platform underpinned by MongoDB Atlas (see Figure 3). Their end-to-end solution provides retailers with the tools to transform their ecommerce capabilities in a matter of days, instead of building a solution in-house. Commercetools is also a MACH Alliance member, fully embracing composable architecture paradigms explored in this series. Adopting Commercetools as your ecommerce platform of choice lets you automatically scale your ecommerce as traffic increases, and it integrates with many third-party systems, ranging from payment platforms to front-end solutions. Additionally, its headless nature and strong API layer allow your front-end to be adapted based on your brands, currencies, and geographies. Commercetools runs on and natively ingests data from MongoDB. Leveraging MongoDB for your other home-grown applications means that you can standardize your data estate, while taking advantage of the many capabilities that the MongoDB data platform has to offer. The same principles can be applied to other SaaS solutions running on MongoDB. Figure 3: MongoDB Atlas and Commercetools capabilities. Find out more about the MongoDB partnership with Commercetools . Learn how Commercetools enabled Audi to integrate its in-car commerce solution and adapt it to 26 countries . MongoDB supports your home-grown applications MongoDB offers a powerful developer data platform, providing the tools to leverage composable architecture patterns and build differentiating experiences in-house. The same benefits of MongoDB’s cloud-native architecture explored earlier are also applicable in this context and are leveraged by many retailers globally, such as Conrad Electronics, running their B2B ecommerce platform on MongoDB Atlas . Summary Cloud-native principles are an essential component of modern systems and applications. They support ISVs in developing powerful SaaS applications and can be leveraged to build proprietary systems in-house. In both scenarios, MongoDB is strongly positioned to deliver on the cloud-native capabilities that should be expected from a modern data platform. Stay tuned for our final blog of this series on Headless and check out our previous blogs on Microservices and API-first .
How a Data Mesh Facilitates Open Banking
Open banking shows signs of revolutionizing the financial world. In response to pressure from regulators, consumers, or both, banks around the world continue to adopt the central tenet of open banking: Make it easy for consumers to share their financial data with third-party service providers and allow those third parties to initiate transactions. To meet this challenge, banks need to transition from sole owners of financial data and the customer relationship to partners in a new, distributed network of services. Instead of competing with other established banks, they now compete with fintech startups and other non-bank entities for consumer attention and the supply of key services. Despite fundamental shifts in both the competition and the customer relationship, however, open banking offers a huge commercial opportunity, which we’ll look at more closely in this article. After all, banks still hold the most important currency in this changing landscape: trust. Balancing data protection with data sharing Established banks hold a special position in the financial system. Because they are long-standing, heavily regulated, and backed by government agencies that guarantee deposits (e.g., the FDIC in the United States), established banks are trusted by consumers over fintech startups when it comes to making their first forays into open banking. A study by Mastercard of 4,000 U.S. and Canadian consumers found that the majority (55% and 53%, respectively) strongly trusted banks with their financial data. Only 32% of U.S. respondents and 19% of Canadians felt the same way about fintech startups. This position of trust extends to the defensive and risk-averse stance of established banks when it comes to sharing customer data. Even when sharing data internally, these banks have strict, permission-based data access controls and risk-management practices. They also maintain extensive digital audit trails. Open banking challenges these traditional data access practices, however, causing banks to move to a model where end customers are empowered to share their sensitive financial data with a growing number of third parties. Some open banking standards, such as Europe’s Payment Services Directive (PSD2), specifically promote informed consent data sharing, further underlining the shift to consumers as the ultimate stewards of their data. At the same time, banks must comply with evolving global privacy laws, such as Europe’s General Data Protection Regulation (GDPR). These laws add another layer of risk and complexity to data sharing, granting consumers (or “data subjects” in GDPR terms) the right to explicit consent before data is shared, the right to withdraw that consent, data portability rights, and the right to erasure of that data — the famed “right to be forgotten.” In summary, banks are under pressure from regulators and consumers to make data more available, and customers now make the final decision about which third parties will receive that data. Banks are also responsible for managing: Different levels of consent for different types of data The ability to redact certain sensitive fields in a data file, while still sharing the file Compliance with data privacy laws, including "the right to be forgotten" The open opportunity for banks In spite of the competition and added risks for established banks, open banking greatly expands the global market of customers, opens up new business models and services, and creates new ways to grow customer relationships. In an open banking environment, banks can leverage best-of-breed services from third parties to bolster their core banking services and augment their online and mobile banking experiences. Established banks can also create their own branded or “white label” services, like payment platforms, and offer them as services for others to use within the open banking ecosystem. For customers, the ability of third parties to get access to a true 360-degree view of their banking and payment relationships creates new insights that banks would not have been able to generate with just their own data. Given the risks, and the huge potential rewards, how do banks satisfy the push and pull of data sharing and data protection? How do they systematically collect, organize, and publish the most relevant data from across the organization for third parties to consume? Banks need a flexible data architecture that enables the deliberate collection and sharing of customer data both internally and externally, coupled with fine-grained access, traceability, and data privacy controls down to the individual field level. At the same time, this new approach must also provide a speed of development and flexibility that limits the cost of compliance with these new regulations and evolving open banking standards. Rise of the data mesh Open banking requires a fundamental change in a bank’s data infrastructure and its relationship with data. The technology underlying the relational databases and mainframes in use at many established banks was first developed in the 1970s. Conceived long before the cloud computing era, these technologies were never intended to support the demands of open banking, nor the volume, variety, and velocity of data that banks must deal with today. Banks are overcoming these limitations and embracing open banking by remodeling their approach to data and by building a data mesh using a modern developer data platform. What is a data mesh? A data mesh is an architectural framework that helps banks decentralize their approach to sharing and governing data, while also enabling self-service consumption of that data. It achieves this by grouping a bank’s data into domains. Each domain in a data mesh contains related data from across the bank. For example, a "consumer" domain may contain data about accounts, addresses, and relationship managers from across every department of the bank. Each data domain is owned by a different internal stakeholder group or department within the bank, and these owners are responsible for collecting, cleansing, and distributing the data in their domain across the enterprise and to consumers. With open banking, domain owners are also responsible for sharing data to third parties. This decentralized, end-to-end approach to data ownership encourages departments within the bank to adopt a “product-like” mentality toward the data within their domain, ensuring that it is maintained and made available like any other service or product they deliver. For this reason, the term data-as-a-product is synonymous with data mesh. Data domain owners are also expected to: Create and maintain relevant reshaped copies of data, rather than pursue a single-source-of-truth or canonical model. Serve data by exposing data product APIs. This means doing the cleansing and curation of data as close as possible to the source, rather than moving data through complex data pipelines to multiple environments. The successful implementation of a data mesh, and the adoption of a data-as-a-product culture, requires a fundamental understanding of localized data. It also requires proper documentation, design, management, and, most important, flexibility, as in the ability to extend the internal data model. The flexibility of the document model is, therefore, critical for success. Conclusion Open banking holds great potential for the future of the customer experience, and will help established financial institutions meet the ever-evolving customer expectations. Facilitated by a data mesh, you can open new doors for responsible, efficient data sharing across your financial institution, and this increase in data transparency leads to better outcomes for your customers—and your bottom line. Want to learn more about the benefits of open banking? Watch the panel discussion Open Banking: Future-Proof Your Bank in a World of Changing Data and API Standards .
What’s New in Atlas Charts: Streamlined Data Sources
We’re excited to announce a major improvement to managing data sources in MongoDB Atlas Charts : Atlas data is now available for visualization automatically, with zero setup required. Every visualization relies on an underlying data source. In the past, Charts made adding Atlas data as a source fairly straightforward, but teams still needed to manually choose clusters and collections from which to power their dashboards. Streamlined data sources , however, eliminates the manual steps required to add data sources into Charts. This feature further optimizes your data visualization workflow by automatically making clusters, serverless instances, and federated database instances in your project available as data sources within Charts. For example, if you start up a new cluster or collection and want to create a visual quickly, you can simply go into one of your dashboards and start building a chart immediately. Check out streamlined data sources in action: See how the new data sources experience streamlines your data visualization workflow in Charts. Maintain full control of your data Although all project data will be available automatically to project members by default, we know how important it is to be able to control what data can be used by your team. For example, you may have sensitive customer data or company financials in a cluster. Project owners maintain full control over limiting access to data like this when needed. As shown in the following image, with a few clicks, you can select any cluster or collection, confirm whether or not any charts are using a data source, and disconnect when ready. If you have collections that you want some of your team to access but not others, this can be easily achieved under Data Access in collection settings as seen in the following image. With every release, our goal is to make visualizing Atlas data more frictionless and powerful. The Streamlined data sources feature helps us take a big step in this direction. Building data visualizations just got even easier with Atlas Charts. Give it a try today ! New to Atlas Charts? Get started today by logging into or signing up for MongoDB Atlas , deploying or selecting a cluster, and activating Charts for free.
How to Leverage Enriched Queries with MongoDB 6.0
MongoDB introduces useful new functions and features with every release, and MongoDB 6.0, released this summer, offers many notable improvements , including deeper insights from enriched queries via the MongoDB Query API . This set of query enhancements was announced at MongoDB World 2022 by senior product manager Katya Kamenieva. You can watch her presentation below. Watch Kayta Kamenieva’s MongoDB World presentation on queries. Users can now use upgraded operators and change stream features. In this post, we’ll look at several of these updates, along with examples of how you can put them to use. Top N accumulators With this new feature, users can compute top items in each group based on the sort criteria ( $topN , $bottomN ), current order of documents ($firstN, $lastN), or value of a field ( $manX , $minN ). This functionality would be useful, for example, if you have a collection of restaurants with ratings, and you want to see the top three highest-rated restaurants based on the type of cuisine. You can group by cuisine and use $topN to return the top three restaurants by rating. Ability to sort arrays The ability to sort an array allows users to sort elements in the array. For example, suppose you have posted content with hundreds of user comments, and you want to sort the comments based on how many likes they received. In this case, $sortArray can pull those comments and prioritize them to the top of the comments list. Densification and gap-filling These new additions to the aggregation framework help to build out time series data more completely. When attempting to create histograms of data over time, the new stages, $denisfy and $fill , allow you to fill gaps in that data to create smoother and more complete graphs using linear interpolation, last/next observed value carried forward, or a constant value. This capability can be helpful, for example, if you want to create a graph that shows the amount of inventory in a warehouse every day for a year, but the inventory was only recorded once a week. The $densify expression will fill the gaps in the timeline, while $fill will produce values for the inventory data based on the previous observation. Joining sharded collections With this new feature, when joining collections using $lookup or performing recursive search with $graphLookup , collections on both sides can be sharded. Before 6.0, only the originating collection could be sharded. An example use case is enriching records in the “accounts” collection with the list of the corresponding orders that are stored in the “orders” collection. In the past, only “accounts” collections could be sharded. Starting with 6.0, both “accounts” and “orders” collections can be sharded. Change streams pre- and post-images Change streams now offer point-in-time (PIT) pre- and post-image capabilities , allowing users to include the state of the document before and after changes in the output of the change stream. This functionality can be useful in many situations. For example, suppose a company is tracking flight times. If a flight is delayed, the system can compare the value of the departure and arrival times both before and after that delay and trigger an automatic rewrite of the schedule for the new flight timeline, including schedules for the entire crew. Atlas Search across multiple collections This improvement to MongoDB Atlas Search allows users to search across multiple collections with a single query using $search inside the $unionWith or $lookup stages. $search can provide these results quickly, using only one query. Enriched queries are not the only improvements in MongoDB 6.0. Read about the 7 reasons to upgrade to MongoDB 6.0 and discover the possibilities. Try MongoDB Atlas for Free Today
MongoDB Connector for Apache Kafka 1.8 Available Now
MongoDB has released version 1.8 of the MongoDB Connector for Apache Kafka with new monitoring and debugging capabilities. In this article, we’ll highlight key features of this release. JMX monitoring The MongoDB Connector works with Apache Kafka Connect to provide a way for users to easily move data between MongoDB and the Apache Kafka. The MongoDB connector is written in Java and now implements Java Management Extensions (JMX) interfaces that allow you to access metrics reporting. These metrics will make troubleshooting and performance tuning easier. JMX technology, which is part of the Java platform, provides a simple, standard way for applications to provide metrics reporting with many third-party tools available to consume and present the data. For those who might not be familiar with JMX monitoring , let’s look at a few key concepts. An MBean is a managed Java object, which represents a particular component that is being measured or controlled. Each component can have one or more MBean attributes. The MongoDB Connector for Apache Kafka publishes MBeans under the “com.mongodb.kafka.connector” domain. Many open source tools are available to monitor JMX metrics, such as the console-based JmxTerm or the more feature-complete monitoring and alerting tools like Prometheus . JConsole is also available as part of the Java Development Kit (JDK). Note: Regardless of your client tool, MBeans for the connector are only available when there are active source or sink configurations defined on the connector. Visualizing metrics Figure 1: Source task JMX metrics from JConsole. Figure 1 shows some of the metrics exposed by the source connector using JConsole. In this example, a sink task was created and by default is called “sink-task-0”. The applicable metrics are shown in the JConsole MBeans panel. A complete list of both source and sink metrics will be available in the MongoDB Kafka Connector online documentation shortly after the release of 1.8. MongoDB Atlas is a great platform to store, analyze, and visualize monitoring metrics produced by JMX. If you’d like to try visualizing JMX metrics in MongoDB Atlas generated by the connector, check out jmx2mongo . This tool continuously writes JMX metrics to a MongoDB time series collection. Once the data is in MongoDB Atlas, you can easily create charts from the data like the following: Figure 2: MongoDB Atlas Chart showing successful batch writes vs writes greater than 100ms. Figure 2 shows the number of successful batch writes performed by a MongoDB sink task and the number of those batch writes that took longer than 100ms to execute. There are many other monitoring use cases available; check out the latest MongoDB Kafka Connector documentation for more information. Extended debugging Over the years, the connector team collected requests from users to enhance error messages or provide additional debug information for troubleshooting. In 1.8, you will notice additional log messages and more descriptive errors. For example, before 1.8, if you set the copy.existing parameter, you may get the log message: “Shutting down executors.” This message is not clear. To address this lack of clarity, the message now reads: “Finished copying existing data from the collection(s).” These debugging improvements in combination with the new JMX metrics will make it easier for you to gain insight into the connector and help troubleshoot issues you may encounter. If you have ideas for additional metrics or scenarios where additional debugging messages would be helpful, please let us know by filing a JIRA ticket . For more information on the latest release, check out the MongoDB Kafka Connector documentation . To download the connector, go to the MongoDB Connector repository in GitHub or download from the Confluent Hub .
Network, Build, and Learn at MongoDB.local Events — Now Free to Attend
Panel Discussion at MongoDB.local London, 2021 Every year, MongoDB hosts popular MongoDB.local events in major cities around the world. Packed with workshops, talks, and keynotes, these one-day, in-person gatherings bring together engineers, entrepreneurs, and executives from the surrounding area. This year, for the first time, admission to MongoDB.local events is free. (Note that admission is granted on a first-come, first-served basis, limited only by seating capacity.) Five upcoming events Five MongoDB.local events are scheduled for the remainder of 2022, and you can register for the .local event near you through the links below or through the MongoDB.local hub page . Frankfurt , September 27, 2022 San Francisco , October 20, 2022 Dallas , October 27, 2022 London , November 15, 2022 Toronto , December 15, 2022 From sessions on the future of serverless to demos of next-generation technology, here’s what to expect at a MongoDB.local event near you. Learn from the experts Whether you attend keynote presentations or participate in customer discussions, you can tap into a wealth of knowledge from people and organizations that are thoroughly familiar with today’s technology landscape. You’ll learn from MongoDB experts, who will share hard-earned knowledge, practical solutions, and technical insight based on firsthand experience with common issues. You can also attend talks from MongoDB customers, which are generally centered around a specific use case and solution — a sort of shared retrospective for the public. At .local Frankfurt, for example, an engineer from Bosch will discuss the company’s evolution from individual documents to time series data in an IoT environment. All MongoDB.locals include sessions for a wide array of skill levels and specialities, such as a deep dive into the new Queryable Encryption feature or an introduction to building a basic application using Atlas Device Sync and React. These workshops offer practical, actionable advice that you can implement immediately upon returning to your office. Expand your professional network MongoDB.local events also offer many opportunities to expand your personal and professional network. In particular, these gatherings are a great way to connect with members of your local MongoDB User Group, who are likely working with the same technologies (or facing similar challenges) that you are. Whether you’re searching for a new job or business opportunity, looking for tips and techniques to implement in your own environment, or just browsing for inspiration, you’ll likely find what you seek at MongoDB.local. Explore the latest products Product booths are another highlight of MongoDB.local events. Staffed by MongoDB product teams, these booths are where you can pick up limited edition stickers, discuss the latest developments with expert engineers, and see new MongoDB features in action. Every event also features booths where third-party partners, vendors, and allies demonstrate cutting-edge technology, show how their platforms and services work in tandem with MongoDB, and answer any questions you may have. Stop by these booths to explore the next big thing in data, see how MongoDB can provide new solutions for pressing problems, and come away with helpful, personalized advice for your own challenges. Enjoy a one-of-a-kind experience From Frankfurt’s Klassikstadt to London’s Tobacco Dock , MongoDB.locals are held at unique, memorable venues. Step inside refurbished historical sites, such as a former factory turned automobile museum or a shipping wharf converted into a top-tier event space. In addition to a full day of talks and tutorials, attendees can enjoy breakfast, lunch, snacks, and drinks served at MongoDB.locals. Join us for a day packed with learning and networking opportunities in a venue near you. Whether you’re a decision-maker or a developer, you’ll find something interesting, enlightening, or useful at MongoDB.local. Learn more about our upcoming MongoDB.local events in Frankfurt , San Francisco , Dallas , London , and Toronto , and register for your free ticket.
Built With MongoDB: Vanta Automates Security and Compliance for Fast-Growing Businesses
How to Use MongoDB Atlas to Make Your CRM More Efficient
As part of digital transformation, many companies want to optimize their internal business processes, gain more visibility into important business metrics, and create new automation routines. Data is always at the core of business processes and metrics, and most business-critical data is often located in one or a few repositories, such as a customer relationship management system (CRM). Historically business users have relied on spreadsheets and enterprise data warehouses for bringing the data together and making decisions. These solutions can range from a disjointed set of dashboards to an all-in-one central console. But businesses that need to move fast need to iterate on their data and processes fast, and they can’t do that if implementing a change in CRM takes months or if the things are done manually in spreadsheets. This article describes how MongoDB Professional Services created an internal solution to address these issues. Our approach In MongoDB Professional Services, we also needed to streamline our business processes and get out of spreadsheets for business management, especially for revenue forecasting. As the organization grew, the amount of manual labor associated with spreadsheet maintenance became untenable, and making sense of the data became more difficult, especially when the data might be inconsistent, stale, or even inaccurate. Ordinarily, a good CRM or Professional Services Automation (PSA) system can help solve this problem. At MongoDB, for example, we use Salesforce, which provides decent flexibility, but also requires heavy customization and has limitations. We’ve also seen MongoDB customers address the problem by building ETL pipelines into MongoDB Atlas and taking advantage of MongoDB’s flexible schema, query language and aggregation framework, and Atlas Search . The data from source systems is ingested as-is or remapped to create a single view. The best approach we’ve found, however, is to optimize the schema for how the data will be consumed, with different parts of documents potentially coming from different source systems. Atlas App Services provides a serverless abstraction layer that allows fine-grained but flexible control over the schema to help you avoid conflicts and iterate without breaking compatibility. After considering alternatives, we created an internal CRM/PSA-augmenting system that is built on top of the MongoDB Atlas platform to provide us with additional capabilities and flexibility. This solution allows Professional Services to rapidly deliver advanced functionality, such as revenue forecasting, automation, and visibility into complex business metrics. The solution also allows Professional Services to address business systems' needs and promptly react to changes, with functionality beyond what is typically provided by other systems. MongoDB’s internal solution, at its core, is serverless and data-centric, leveraging Atlas App Services functions and triggers for processing the data and Atlas Search for full-text search. It uses Connector for BI , Atlas GraphQL API , and App Services wire protocol and Atlas Functions to access and manipulate data from other components. Its components include a React-based console application, Atlas Charts, Tableau dashboards, Google Sheets, and microservices for data import and integrations. Project view of our internal solution console. Revenue forecasting module in our internal solution console. MongoDB Charts shows business metrics. Solution architecture The data architecture in our internal solution builds on the single view approach and the data-mart concept. The main idea is to ingest relevant data from Salesforce and other systems, enrich it, and build on it quickly, as shown in the following image. We followed these eight key principles to help enable this functionality: Focus on bringing in data in the form that makes the most sense for the business. And, find the right balance between making the ETL easy and optimizing for the foreseen application use cases. Apply transformations in the ETL process to make the ingested data intuitive, including document hierarchy, field names, and data types. Clearly define the data lifecycle in terms of data producers and consumers. Data producers can only overwrite documents and fields that they “own” - and only those. For example, the ETL process from the source system should overwrite the data in MongoDB documents as needed, but it should only modify those fields that are actually coming from the pipeline. Aim to structure MongoDB documents in a way that makes it clear which fields are owned by what producer. Atlas App Services schema and rules can help ensure that the most critical documents and fields are correctly accessed and modified. Use the Atlas Functions and App Services wire protocol in applications and services, as opposed to directly connecting to the Atlas instance. This allowed us to use Google SSO in the console without requiring any sophisticated security mechanisms when we need to do regular CRUD operations from within the application. For complex data logic and on-the-fly calculations, use App Functions . Use database triggers for propagating changes and generating data-driven events. Use scheduled triggers for generating aggregated views and periodic work. Use external services for communicating with the outside world (e.g., email sender, ETL job). The external services are invoked asynchronously by listening on change streams from their respective namespaces (pub-sub model). All external services work independently of each other. Don’t overthink. MongoDB Atlas’s Developer Data Platform offers a lot of flexibility and, if these principles are followed, making changes and iterating on a working system is surprisingly easy. To reiterate the last point, our internal solution is easy to modify and extend because of the flexible schema concept in MongoDB and the independence of external components. Users can access the data through available tools and integrations, and developers can update specific parts of the system or introduce new ones without delays, making this solution efficient in terms of both cost and effort. Conclusion Through this example of our internal solution, we demonstrated that by leveraging MongoDB Atlas in full force, you can solve seemingly intractable business problems with speed, efficiency, and robustness on top of what regular systems can do. Whether you’re optimizing your company’s business processes, building business dashboards, or improving automation, the MongoDB Atlas developer data platform can help make the process easier. Learn how MongoDB’s consulting engineers can help you with design and architecture decisions and accelerate your development efforts. Contact us to learn more .