Engineering, Done DIRT Cheap: How an Outdated Data Architecture Becomes a Tax on Innovation

Mark Porter
November 24, 2021 | Updated: June 3, 2022

In March 2021, I wrote about The Innovation Tax: the idea that clunky processes and outdated technologies make it harder for engineering teams to produce excellent tech that delights customers. In the months since then, my thinking has evolved even further. I couldn’t have guessed how many technology leaders would immediately recognize these problems in their own organizations and share their own deep frustrations with me. This article puts that evolved thought together with the massive feedback that piece received. It will give you actionable ways to decrease your tax burden — and who wouldn’t want that?

The innovation tax, like income tax, is real. Of course, it saps morale (with resulting attrition and churn), but it also has other financial and opportunity costs. Taxed organizations see their pace of innovation suffer as people and resources are locked into maintaining rather than innovating.

We named this tax DIRT. Why? Well, it’s rooted in data (D), because it so often springs from the difficulty of using legacy databases to support modern applications that require access to real-time data to create rich user experiences. It affects innovation (I), because your teams have little time to innovate if they’re constantly trying to figure out how to support a complex and rickety architecture. It’s recurring (R), because it’s not as if you pay the tax (T) once and get it over with. Quite the opposite. DIRT makes each new project ever more difficult because it introduces so many components, frameworks, and protocols that need to be managed by different teams of people.

In retrospect, it’s clear that technology leaders would recognize this tax and immediately grasp the degree to which it’s caused -- or cured -- by their data architecture. Data is sticky, strategic, heavy, intricate -- and the core of the modern digital company. Modern applications have much more sophisticated data requirements than the applications we were building only 10 years ago. Obviously, there is more data, but it’s more complicated than that: Companies are expected to react more quickly and more cleverly to all of the signals in that data. Legacy technologies, including single-model rigid, inefficient, and hard-to-program relational databases, just don’t cut it.

In over 300 CxO conversations I've had since joining MongoDB in 2020, fewer than a handful of CTOs disputed this statement. When your tech stack can’t handle the demands of new applications, engineering teams will often bolt on single-purpose niche databases to do the job (think time series, text, graph, etc.). Then they’ll build a series of pipelines to move data back and forth. And everything will get slow and complicated — and even political. Time to polish up that LinkedIn profile.

If this were rare, it wouldn’t be such a big deal. But large enterprises can have hundreds or thousands of applications, each with their own sources of data and their own pipelines. Over time, as data stores and pipelines multiply, an organization’s data architecture starts to look like a plate of spaghetti. Soon you’re operating and maintaining an entire middleware layer of ETL, ELT, and streaming. The variety of technologies, each with their own frameworks, protocols, and sometimes languages, makes it harder for developers to collaborate. It makes it extremely difficult to scale, because every architecture is bespoke and brittle. Developers spend their precious “flow” hours doing integration work instead of building new applications and features that the business needs and customers will love. Enterprise architects often end up spending their time on all the wrong things.

It’s clear to me that most customers are ready for a new approach to data architecture. One of the best parts of my job is listening to and learning from other CxOs. Since the pandemic made it impossible to do that in person, MongoDB moved these discussions online, inviting technology leaders to hash out some of their biggest problems 1:1 and in groups with me. In one of those sessions, a CTO commented, “Technical debt should be carried on your CFO's balance sheet.” Even on Zoom, the power of that statement was clear.

We also started looking at slide decks about data architecture from some of the best-known venture capital firms. Certainly VCs must position each of their portfolio companies as a critical player in the data architecture of the future. But the overall vision was not compelling. One technology leader said, “When I look at 20 net-new technologies I need to learn, it’s terrifying.” Others commented that just looking at these architecture diagrams was a little off-putting, because they knew their own organization’s data architecture was at least that complicated already. They knew they needed to simplify their data architecture, but more than one admitted to postponing this work -- indefinitely -- because it was just too daunting. I recently met with a major health care company whose executives think it’s just barely possible, but they are bravely diving in anyway, knowing that they must do it and that they’ll learn along the way as they tear down their monoliths.

In many cases, the innovation tax manifests as the inability to even consider new technology because the underlying architecture is too complex and difficult to maintain, much less understand and transform. This is why a lot of senior people at enterprise companies are sitting with their fingers in the transformation dike, waiting for retirement -- they think they can’t modernize.

It won’t surprise you that we also saw how MongoDB, as a general purpose database able to handle all types of data at speed and scale, could help solve this problem. Let me be clear. I’ve been working on or with databases for my entire 35-year career, and I joined MongoDB for a reason: I believe we can build the database and application-building environment that I’ve wanted to create and use for at least 30 of those years. Our vision of MongoDB goes beyond our namesake database to a broader, more versatile data platform that allows you to accelerate and simplify how you build any type of application. It represents significant progress toward our larger goal, which remains the same as ever: to make data stunningly easy to work with. We want to see data become an enabler of innovation, not a blocker. And we want to finally allow technology teams to start to untangle their sprawl and get rid of their DIRT.

Where to start? It’s good to have a better understanding of just how DIRT might be holding your teams back. Do your developers have trouble collaborating because the development environment is so fragmented? Do schema changes take longer to roll out than the application changes they’re designed to support? Do you have trouble building 360-degree views of your customers? And if so, why? These are all good places to start digging in the DIRT.

You might also take a hard look at your applications and data sources, as well as what it would take to move your data onto a new data platform. That could mean identifying the objects in your applications and all the applications that interact with them. You could then assign a complexity score to each one based on attributes such as properties, methods, collections, and attributes. Now take a step back and identify each application that connects to each of those objects and rank it based on how mission-critical it is, how many people rely on it, how many tasks it has to perform, and the complexity of those tasks. Once you have a better handle on all this complexity, you’ll be better positioned to create a plan to move off your legacy systems, perhaps starting with the least complex and least integrated data sources. Of course, your metrics and your mileage will vary, but the point is to start.

I don’t pretend any of this is easy. Like many of you, I’ve spent most of my career working on problems just like these. But that also means I know progress when I see it, and the beginning of a way for organizations to start to clean up their DIRT.

I’ll be continuing to write more about these challenges and hopefully continue to add some perspective. If you’re curious to learn more about DIRT, you can download our white paper. As always, I’m eager to have you tweet your alignment, lack thereof, or other thoughts at @MarkLovesTech. You can also reach out to me on marklovestech.com, where you will find a compilation of my latest musings related to MongoDB and otherwise.

← Previous

The Power of Embracing Differences: My Journey to MongoDB

September 14th, 2021 marked my first full year at MongoDB, and what a year it’s been. A bit about me Hi, I’m Cara! I’m a Team Lead, Executive Assistant, specifically for Tech & Product. I’m based out of our NYC office and live in Jersey City with my girlfriend and our three cats. At MongoDB, I support our amazing Chief Product Officer and also lead a team of awesome Administrative Assistants (AAs) and Executive Assistants (EAs) within Tech & Product. We are hiring like crazy, too, and I can’t say enough great things about our team. Beyond my already rewarding and challenging role as a Team Lead, I also get to work on other meaningful projects while growing my core career. I’m incredibly grateful and humbled to be a Global Lead for two of MongoDB’s affinity groups (known as employee resource groups at some companies) alongside some of the best, most passionate people I’ve ever met: Queeries - A closed group and safe space for people who personally identify within the LGBTQIA+ spectrum. The Queer Collective - An open group for the LGBTQIA+ community as well as our amazing allies (all are welcome!) to exchange thoughts, ideas, and learn and grow from each other. As we like to say, the future is inclusive! Finding my voice and professional purpose The funny thing is, I didn’t know what an “affinity group” or “employee resource group” was for most of my career. I used to work in a more conservative corporate environment and spent over a decade in the food/hospitality industry with people whose views were wildly different from mine. One of my bosses always asked me if I had a boyfriend or when I was going to settle down with a nice guy. It was awkward and uncomfortable, but it was a discomfort I got used to. How sad is that? The crazy thing was, it didn’t feel sad or weird or anything at the time. I just thought I had to stay hidden at work. That’s what you did. It wasn’t “professional” to be gay. The first time I saw a queer coworker was when I had my first real introduction to the tech start-up environment. He was so vibrantly open about who he was, and I was in awe of him. I stayed quiet for my first few months there and studied people’s reactions, interactions, and how they responded when he would say things that I never thought could be said in an office. They weren’t bad things by any means, but they were topics about being queer that I watched everyone embrace. Then, it slipped out during lunch one day. I thought maybe I could casually mention going on a date so it would be less weird, but everyone was super surprised. I get told I “look straight” a lot, which I’ve always found irritating. What does that even mean? Do I need to be masculine-presenting to be gay? Me (right) and my girlfriend From there, I moved on to work at Zocdoc, which truly opened my eyes to affinity groups, workplace queer communities, and how far they expand. It was the first place I worked that even had an affinity group. I befriended two amazing humans there who were the founders of ZocPride, which represented Zocdoc’s queer community. We got to talking and they told me they only planned something for Pride month. They’re not planners, they actually hate planning, but they didn’t want the group to die. So I said, “Good news. Hi, I’m Cara. I’m super queer and I love to plan things!” We chuckled and then I immediately started planning and researching what I could do with this awesome gift I was just given. Since we had no D&I team and a very limited budget, I worked to find other companies to partner with as well as vendors who would be open to sponsoring events for us. Before I knew it, we were partnering with Out in Tech to host an external panel discussion about queer access to healthcare. We hosted it on Coming Out Day and had about 300 guests. From there, things really took off. We did a “spread the love” campaign for Valentine’s Day, had hugely successful fundraisers for NYC’s AIDS Walk, and then, you guessed it, went crazy for Pride. I proudly introduced the art of drag to Zocdoc and started their annual Drag Bingo Pride event. We also sponsored and had a booth at the Lesbians Who Tech Summit the year that Hilary Clinton came to speak. It was unbelievable. My MongoDB journey After receiving incredible offers to work at a few more companies, unexpectedly experiencing workplace discrimination, and reflecting on what I want and need to be happy and thrive in a work environment, I found myself at MongoDB. One of my amazing colleagues from Zocdoc was working here and we were catching up. I heard the details about the Company and role and thought it sounded like a great fit! I love working in tech, but specifically with Product & Tech teams. They’re brilliant, passionate, quirky personalities that vibe well with mine and in my experience, are hyper-focused on having fun and building a positive culture. Because of my previous experiences, I knew exactly what I was looking for. I asked questions that could be uncomfortable to some, as far as the company’s commitment to Diversity & Inclusion, what it means to them personally, and how they practice what they preach. I didn’t want any more wooden nickels. The interview process was amazing. Everyone was super responsive, informative, and helpful and didn’t hesitate to answer any of my hard-hitting questions. Interviews are a two-way street, and I was immediately put at ease when I realized that MongoDB was the place for me. My recruiter started telling me about our growing D&I team, our affinity groups, and how involved and supportive the leadership team is. Then I got to interview with my manager, our Chief Product Officer, who I clicked with instantly. I knew right away that I wanted to work with him. In my experience, I haven't always been lucky with great bosses. I’ve been ignored, lied to, dismissed, looked over, and simply not appreciated. I don’t feel that way here. I feel heard and respected, and that speaks volumes in itself. I’m often encouraged to take time for myself. I had some personal health issues at the beginning of the year. I was anxious to take time off because I was still so new, but the outpour of support and understanding I received blew my mind. That’s when I knew I had really found my new home. When I joined MongoDB last year, The Queer Collective was still a new group, only three months old at that point, and I was able to join at a very exciting time when there was lots of opportunity and momentum. We officially launched the group alongside the communication of launching our first-ever celebration of (inter)national Coming Out Day . We celebrated again this year and have decided that it will be a company-wide annual tradition. Last year, four of our leads (myself included) shared their coming out stories, and we didn’t realize how much of an impact it made until feedback started to trickle in. We were told that some employees joined MongoDB after reading our stories and some even felt comfortable coming out of the closet and stepping into their own light. If that’s not rewarding, I don’t know what is. This year, more employees shared their stories , and we partnered with our Benefits team to host an internal panel discussion. October is Mental Health Awareness Month, and we thought it would be the perfect time to talk through and bring awareness to the mental health journey that comes along with coming out and embracing your true, authentic self. We will also be planning a full week of impactful programming for Trans Awareness Week so that we can continue to amplify the voices in the Trans Community while encouraging continued education. This past July, I also spoke at MongoDB.live (formerly known as MongoDB World) with my Queer Collective co-lead and dear friend Seán Carroll about Allyship and how to upgrade to an active accomplice. It explored what accountability and support look like and how we can all improve our support of the LGBTQIA+ community. The feedback was amazing, and I can’t wait to evolve our topic and content and hopefully speak in person next year! I also have the pleasure of working closely with our incredible D&I team on impactful initiatives, such as helping with large external events and partnerships like the Lesbians Who Tech Summit, where we secured a top-tier sponsorship at the largest queer tech event in the world! I’ve also been part of meaningful conversations, such as expanding gender and identity options and helping to evolve and plan for benefits that help and impact the Queer community. The list goes on, really. I frequently sync with our D&I team and I’m so grateful to work somewhere that truly invests in fostering an inclusive and equitable work environment. Why MongoDB is the place for me I’ve worked in a lot of different industries, with people from every level and walk of life, and now I feel as though I’m where I was meant to be. MongoDB’s values truly align with my own, and this is the first company that I’ve seen make an actual effort to align their company objectives and goals with their values. Here’s how I live some of our MongoDB values every day: I proudly embrace the power of everyone’s differences (mine included). We evolve and move forward with a magical combination of varied backgrounds, interests, and ideas. Why bother doing anything if you don’t plan to make it matter ? I stand behind everything I work on and am proud of the meaningful projects and impacts I’ve seen first-hand so far. I’ve always been a big idea kind of human - Think Big, Go Far - I thrive on creativity, ambition, and being a relentless dreamer. When I joined, I received a postcard from our CEO. Part of it said, “We want your time here to become a real inflection point in your professional career”, and I can wholeheartedly say after just my first year, it already is. I’m constantly learning and growing at MongoDB. From management training to webinars to endless learning and development resources, and beyond. These were things I had been requesting, asking, and looking for at previous companies. They were things promised to me “eventually”, but they never came. Here I was in my first week at MongoDB, given them without asking. This is a company that truly cares about its employees’ development and success. I’ve hired (and am growing) an awesome team of amazing humans who I’m so proud to work alongside every day. Any job can be great, but the people make it extra special. The EA team at MongoDB is like no other, and I can’t wait to see its continued growth and evolution. Helping to build and evolve a world-class EA org is incredibly exciting and rewarding, and I love being a part of it. I love that I can be fully myself at work and am given the opportunity to make an impact in so many ways. I can’t wait to see what the future will bring. It’s been an unbelievable experience and journey so far! Interested in joining MongoDB? We have several open roles on our teams across the globe and would love for you to transform your career with us!

November 23, 2021

Next →

Retrieval Augmented Generation for Claim Processing: Combining MongoDB Atlas Vector Search and Large Language Models

Following up on our previous blog, AI, Vectors, and the Future of Claims Processing: Why Insurance Needs to Understand The Power of Vector Databases , we’ll pick up the conversation right where we left it. We discussed extensively how Atlas Vector Search can benefit the claim process in insurance and briefly covered Retrieval Augmented Generation (RAG) and Large Language Models (LLMs). MongoDB.local NYC Join us in person on May 2, 2024 for our keynote address, announcements, and technical sessions to help you build and deploy mission-critical applications at scale. Use Code Web50 for 50% off your ticket! Learn More One of the biggest challenges for claim adjusters is pulling and aggregating information from disparate systems and diverse data formats. PDFs of policy guidelines might be stored in a content-sharing platform, customer information locked in a legacy CRM, and claim-related pictures and voice reports in yet another tool. All of this data is not just fragmented across siloed sources and hard to find but also in formats that have been historically nearly impossible to index with traditional methods. Over the years, insurance companies have accumulated terabytes of unstructured data in their data stores but have failed to capitalize on the possibility of accessing and leveraging it to uncover business insights, deliver better customer experiences, and streamline operations. Some of our customers even admit they’re not fully aware of all the data in their archives. There’s a tremendous opportunity to leverage this unstructured data to benefit the insurer and its customers. Our image search post covered part of the solution to these challenges, opening the door to working more easily with unstructured data. RAG takes it a step further, integrating Atlas Vector Search and LLMs, thus allowing insurers to go beyond the limitations of baseline foundational models, making them context-aware by feeding them proprietary data. Figure 1 shows how the interaction works in practice: through a chat prompt, we can ask questions to the system, and the LLM returns answers to the user and shows what references it used to retrieve the information contained in the response. Great! We’ve got a nice UI, but how can we build an RAG application? Let’s open the hood and see what’s in it! Figure 1: UI of the claim adjuster RAG-powered chatbot Architecture and flow Before we start building our application, we need to ensure that our data is easily accessible and in one secure place. Operational Data Layers (ODLs) are the recommended pattern for wrangling data to create single views. This post walks the reader through the process of modernizing insurance data models with Relational Migrator, helping insurers migrate off legacy systems to create ODLs. Once the data is organized in our MongoDB collections and ready to be consumed, we can start architecting our solution. Building upon the schema developed in the image search post , we augment our documents by adding a few fields that will allow adjusters to ask more complex questions about the data and solve harder business challenges, such as resolving a claim in a fraction of the time with increased accuracy. Figure 2 shows the resulting document with two highlighted fields, “claimDescription” and its vector representation, “claimDescriptionEmbedding” . We can now create a Vector Search index on this array, a key step to facilitate retrieving the information fed to the LLM. Figure 2: document schema of the claim collection, the highlighted fields are used to retrieve the data that will be passed as context to the LLM Having prepared our data, building the RAG interaction is straightforward; refer to this GitHub repository for the implementation details. Here, we’ll just discuss the high-level architecture and the data flow, as shown in Figure 3 below: The user enters the prompt, a question in natural language. The prompt is vectorized and sent to Atlas Vector Search; similar documents are retrieved. The prompt and the retrieved documents are passed to the LLM as context. The LLM produces an answer to the user (in natural language), considering the context and the prompt. Figure 3: RAG architecture and interaction flow It is important to note how the semantics of the question are preserved throughout the different steps. The reference to “adverse weather” related accidents in the prompt is captured and passed to Atlas Vector Search, which surfaces claim documents whose claim description relates to similar concepts (e.g., rain) without needing to mention them explicitly. Finally, the LLM consumes the relevant documents to produce a context-aware question referencing rain, hail, and fire, as we’d expect based on the user's initial question. So what? To sum it all up, what’s the benefit of combining Atlas Vector Search and LLMs in a Claim Processing RAG application? Speed and accuracy: Having the data centrally organized and ready to be consumed by LLMs, adjusters can find all the necessary information in a fraction of the time. Flexibility: LLMs can answer a wide spectrum of questions, meaning applications require less upfront system design. There is no need to build custom APIs for each piece of information you’re trying to retrieve; just ask the LLM to do it for you. Natural interaction: Applications can be interrogated in plain English without programming skills or system training. Data accessibility: Insurers can finally leverage and explore unstructured data that was previously hard to access. Not just claim processing The same data model and architecture can serve additional personas and use cases within the organization: Customer Service: Operators can quickly pull customer data and answer complex questions without navigating different systems. For example, “Summarize this customer's past interactions,” “What coverages does this customer have?” or “What coverages can I recommend to this customer?” Customer self-service: Simplify your members’ experience by enabling them to ask questions themselves. For example, “My apartment is flooded. Am I covered?” or “How long do windshield repairs take on average?” Underwriting: Underwriters can quickly aggregate and summarize information, providing quotes in a fraction of the time. For example, “Summarize this customer claim history.” “I Am renewing a customer policy. What are the customer's current coverages? Pull everything related to the policy entity/customer. I need to get baseline info. Find relevant underwriting guidelines.” If you would like to discover more about Converged AI and Application Data Stores with MongoDB, take a look at the following resources: RAG for claim processing GitHub repository From Relational Databases to AI: An Insurance Data Modernization Journey Modernize your insurance data models with MongoDB and Relational Migrator

April 18, 2024