Congratulations to the 2022 Innovation Award Winners
I just got off stage at MongoDB World, where I had the honor of announcing 15 winners of the ninth annual MongoDB Innovation Awards. The MongoDB Innovation Awards honor projects and people who dream big. They celebrate the groundbreaking use of data to build compelling applications and the creativity of professionals expanding the limits of technology with MongoDB. This year, we received applications from a diverse range of organizations, from emerging startups to industry-leading global enterprises, across a wide variety of industries. We are delighted to announce the winners below. 2022 MongoDB Innovation Award Winners Customer-First and Innovator of the Year Award: BEES Anheuser-Busch InBev (AB InBev), home to several of the world’s most recognizable beer brands, chose MongoDB Atlas as the primary database for its proprietary B2B platform, BEES. The platform digitizes AB InBev’s relationship with its customers, offering convenience, seamless communication, and most important, enhanced business performance. Putting customers first has helped BEES grow to a network of 2.7 million monthly active users across 17 markets, process over 23 million orders, and capture more than $6.5 billion in gross merchandise value during the first quarter of 2022. Data for Good Award: Memorial Sloan Kettering Cancer Center Memorial Sloan Kettering Cancer Center (MSKCC) is a world-renowned, state-of-the-art cancer facility. According to US News & World Report, MSKCC has been ranked as one of the top two hospitals for cancer care in the country for more than 30 years, and among the nation's top pediatric hospitals for cancer care. MSKCC's Department of Pathology plays a pivotal role in diagnosing the type of cancer affecting a patient, which in turn helps providers determine possible treatments. MPath, an advanced software ecosystem developed in-house, is built on MongoDB and supports digital review and reporting of over three dozen Molecular Pathology diagnostic tests. To date, MPath has provided digital review and reporting for over 200,000 molecular diagnostic tests. From Batch to Real-Time Award: AT&T To build its next-generation AI-based fraud-detection platform, AT&T quickly discovered that relational technology would not be able to scale and support their application’s needs and requirements. Given their desire for a flexible data model, AT&T turned to MongoDB Atlas, which has decreased their time to market and improved their query response times. As part of an overall modernization effort to enhance an already robust AI environment, MongoDB Atlas will improve performance and further AT&T’s efforts for real-time fraud detection. MongoDB Atlas is also being extended to include the AT&T Feature Store for data modeling. Front Line Heroes Award: Sogei Sogei is an in-house company of the Italian Ministry of Economy and Finances, spearheading public sector digital transformation in Italy. Impressed by the flexibility and performance of MongoDB’s data platform technology, they leveraged MongoDB’s document model to develop and bring to market Italy's official COVID-19 vaccination passport app — in less than 45 days. The "Green Pass" project has generated over 230 million digital certificates to date, helping Italians easily provide vaccination requirements for cultural and sporting events, long-distance travel, nightlife, and indoor dining during the pandemic. Going Global Award: Auth0 Auth0, a product unit within Okta, is an Identity-as-a-Service platform that eliminates the complexity of implementing authentication and authorization capabilities. Looking to further prioritize reliability and security, Auth0 recently migrated its Public Cloud platform from self-hosted MongoDB to MongoDB Atlas to help power billions of authentications per month. This move helps Auth0 scale at a faster pace, allowing them to deliver services to an ever-growing number of customers. Additionally, foundational operations tasks that used to take several weeks can now be completed in hours, enabling Auth(0) to deliver services faster and more reliably to their customers. Huge Impact Award: Cisco Systems Cisco is best known for its enterprise networking gear, but the tech giant is also a major player in the cybersecurity market, the part of its business that’s leading the transformation to an “Everything-as-a-Service” model. Cisco Secure has been on a multi-year journey to fundamentally change the way enterprises think about security by developing its integrated, cloud-native SecureX platform. An integral component of the platform, SecureX Orchestration allows customers to orchestrate critical security workflows using a no/low-code drag-and-drop interface, and helps SecOps, ITOps, and NetOps teams save critical working hours. By migrating to MongoDB Atlas, the SecureX Orchestration team has reaped benefits such as increased scalability, decreased architectural complexity, improved reliability, and lower total cost of ownership. Industry Transformation Award: Corva With an increasing focus on reducing worldwide greenhouse gas emissions, Corva's technologies play a crucial role in helping the energy industry meet its sustainability objectives. Corva's proven technologies, infrastructure, and deep industry knowledge, combined with the power of MongoDB Atlas, have put it on an unbeatable path to creating a sustainability platform that will transform the industry's journey to net-zero carbon emissions. By leveraging a centralized dataset of emissions, Corva has plans to automate greenhouse gas monitoring, analysis, and benchmarking; monitor real-time energy consumption and emissions; and build applications to raise carbon awareness. Operational Resilience at Scale Award: Wells Fargo Wells Fargo is a leading financial services company offering a diversified set of banking, investment, and mortgage products and services, as well as consumer and commercial finance. Tempest is a data fabric built to improve the digital customer experience by providing continuous availability and responsiveness even when portions of the bank's infrastructure are experiencing availability interruptions. This data fabric serves Wells Fargo’s 30 million-plus digital retail customers. Savvy Startup Award: Pioneera The World Health Organization now recognizes burnout as an occupational phenomenon — and there has never been a greater need for a solution. Pioneera is combining psychology and technology to prevent and redress toxic workplace issues, starting with the crippling and insidious issue of workplace burnout. Indie, Pioneera's "Grammarly for Mental Health," helps large and small companies reduce burnout and improve engagement, productivity, and collaboration. Founded and based in Australia, Pioneera uses MongoDB to help it scale globally. Unbound Award: Cue Health Cue Health is a healthcare technology company that makes it easy for individuals to access health information and places diagnostic information at the center of care. Their revolutionary new device, the Cue Health Monitoring System, paired with their COVID-19 test, is the first at-home COVID-19 test available over the counter without a prescription, and it is used by the NBA, Johnson & Johnson, and the Mayo Clinic. The company chose MongoDB Atlas, Search, and Atlas Device Sync to power its mobile application, mobile database, and synchronization, enabling consumers to receive data from their Cue devices on their smartphones and securely store it in the cloud. Cue Health is planning on leveraging MongoDB to launch additional tests such as respiratory, women's and men's health, fertility, and more. Cutting Edge Award: Goldman Sachs Goldman Sachs is a leading global financial institution that serves a diversified client base of corporations, financial institutions, governments and individuals and holds offices in major financial centers worldwide. Over the past 10 years, MongoDB and Goldman Sachs have had a strong engineering collaboration. Together, we have worked to ensure that MongoDB Core Server and Atlas have product capabilities that enable their use in regulated environments, without compromising the developer experience. Goldman Sachs expanded its utilization of Atlas in FY'22, growing both existing and new Consumer and Transaction Banking use cases across the firm; their utilization of cloud solutions and their forward-thinking, cloud-based approach to deploying next generation banking applications represents a cutting edge approach and serves as a model for the financial services industry. Judges' Choice Award: Getir Ultrafast grocery delivery pioneer Getir has revolutionized last-mile grocery delivery with its 10-minute grocery delivery proposition, making thousands of everyday items available within minutes. The company originally built its core grocery delivery platform on MongoDB Community and migrated to MongoDB Atlas. Getir achieved superior performance and reliability, regardless of spikes in traffic during the COVID-19 pandemic, and also relied on Atlas's always-on, multi-region clusters for 99.995% uptime during its critical U.S. launch. Getir utilizes almost 350 clusters, deployed across projects to cover each aspect of its product, with a microservices architecture to create incredible resilience across global markets and time zones. In the past 12 months, Getir has scaled successfully across geographies with minimal downtime due to this approach. Seamless Migration Award: Truist Banks struggle to keep up with the digital natives and neobanks, but Truist's Consumer Tech organization is striving to set an example of how major banks can compete in the digital world. In order for their Client Availability Layer platform, or CAL, to consolidate terabytes of customer data from the heritage system of records into a highly available and scalable operational data layer from scratch to provide uptime to digital applications, the platform required modern technology for support. Truist decided to implement MongoDB on AWS Outpost as a resilient and secure option to align to the maturing cloud strategy at the bank, which has allowed CAL to support new, complex digital banking needs. Certified Professional of the Year Award: Mohit Talniya Mohit Talniya works with PeerIslands, a boutique MongoDB SI based out of the Cayman Islands. He is passionate about building cloud-native applications, solving technical challenges, MongoDB, and cryptocurrency. As a certified MongoDB developer, Mohit has worked with MongoDB teams on crucial, time-sensitive projects, including a mission-critical real-time data migration and building a MongoDB persistence layer for an open-source OAuth framework. He loves playing ping-pong and cricket in his spare time. The William Zola Award for Community Excellence Award: Prasad Saya Prasad is a MongoDB Certified Developer and a natural-born mentor who harnesses his deep curiosity about technology and channels it into providing informative and helpful answers for his fellow developers. Active on Stack Overflow, our own Community Forums (where he has achieved the rank of Forum Elder), and other technical communities such as JavaRanch, Prasad is always there with a thorough understanding of the problem and a detailed answer to get folks going on the right path.
Safe Software Deployments: Through the Looking Glass
We’ve covered a lot of ground in this Safe Software Deployment series, from the 180 Rule to Z Deployments to the Goldilocks Gauge. But there is an elephant in the room. Or should I say, a jabberwock. In Lewis Carroll’s novel Through the Looking Glass , Alice discovers that the mirror above her mantle is not a mirror at all, but a doorway to another world in which things work very differently. When developers push software from staging to production, they often have a similar experience. Though they want to believe that staging and production are the same, they discover that staging is not a mirror at all, and production is another world, in which things work very differently. And out of that distortion come bugs and outages. The bottom line is this: Staging ≠ Production. And it never will. There are simply too many variables between these two environments to ever achieve exact alignment. Those environments can be different in hardware: CPU cores, threads per core, cache size, microcode; bus architectures; memory size; firmware. Or different in software or configuration: OS versions; compilers; libraries; network traffic profiles. Or different in network topology, edge caching; DNS and directory services. And of course we know that no matter how diligent you test in staging, customer workloads exercise the software in different ways. But the major reason they are different is because both environments have had a different set of software deployed to them over time, with a different set of configuration parameters - and a different combination of patches, hacks, and rollbacks. That staging is a mirror of production is one of the great delusions of software development. Developers often tell this lie to themselves, to overcome their fear of pushing to prod. Worse, developers often know the truth, but don’t really know how to explain this ambiguity to their management chain, leading to inevitable trust issues when deployments fail. So what can we do about it? First, accept reality. Modern, distributed software systems are nonlinear, and all test environments are simulacrums. It’s particularly important to help managers understand that even if it were possible to create exact duplicates of production – which it’s not – it would be practically and financially unjustifiable. Second, approach testing like an actuary. The leap from staging to production is essentially an exercise in probability. You should know your architecture, operational characteristics, and costs well enough to prioritize tests and reduce risk. You may even want to create two or more test environments that are deliberately different to reduce the odds of failure. And you should continue to run tests after the release, so you can surface bugs before your customers do. Third, if you can, treat both production and staging like cattle, not pets. If you have an enlightened software organization, one that believes in best practices, set up your systems so that you can blow away and recreate both your production and staging environments at regular intervals. This will reset many of the deviations and environment drift that build up over time. Finally, you have to be able to do automated rollbacks (the 180 Rule ) which work reliably ( Z Deployments ), and that are the right size for optimum efficiency and safety (the Goldilocks Gauge . ;-) I saved this column for last because completely eliminating the differences between staging and production is not a solvable problem. And frankly, you shouldn’t even try. But you don’t want to get caught flat-footed either, so you need a system of best practices that greatly reduces risk and fosters confidence. In other words, a system of Safe Software Deployment that helps you overcome the fear of pushing to prod. And most importantly, everyone in your organization needs to have a common understanding of the problem, from the top down. So feel free to print this post out, slide it under the door of your manager, and slink away like that Cheshire Cat. Have another technique for managing the divergence between staging and production? Share it with me at @MarkLovesTech .
엔지니어링과 DIRT 비용 감소: 시대에 뒤처진 데이터 아키텍처가 혁신을 가로막는 걸림돌인 이유
저는 지난 2021년 3월, 혁신세 The Innovation Tax 에 관한 글을 썼습니다. '혁신세'는 번거로운 프로세스와 시대에 뒤처진 기술로 인해 엔지니어링 팀이 고객을 만족시킬 만한 우수한 기술을 개발하지 못하는 상황을 일컫는 말입니다. 이후로 몇 개월이 지나면서 저의 생각은 훨씬 더 발전했습니다. 앞으로 얼마나 많은 기술 임원이 자신의 회사에서 이러한 문제점을 빠르게 알아차리고 깊은 상실감을 저에게 털어놓을지 짐작조차 못할 정도입니다. 이번 포스팅에서는 수없이 많이 받은 피드백과 함께 제 생각이 어떻게 발전했는지 알려드리겠습니다. 또한 혁신세의 부담을 줄이기 위해 실천할 수 있는 방법도 전해드리겠습니다. 이를 마다할 분은 없으시겠죠? 혁신세는 소득세처럼 실제로 존재합니다. 물론 고객 이탈과 감소로 인해 사기가 떨어지기도 하지만 그뿐 아니라 금융 및 기회 비용도 발생하기 때문입니다. 혁신세 부담이 큰 기업들은 인력과 자원이 혁신이 아닌 유지보수에 집중되어 있어 혁신에 뒤처지는 경우가 많습니다. 우리는 이를 DIRT 비용이라고 합니다. 왜 DIRT일까요? 먼저 DIRT는 데이터(D)에서 발생합니다. 최신 애플리케이션은 실시간 데이터에 액세스하여 풍부한 사용자 경험을 창출해야 하지만 기존 데이터베이스로는 이를 지원하는 데 어려움이 따를 때가 많기 때문입니다. DIRT는 혁신(I)에 영향을 미칩니다. 개발 팀이 복잡하고 취약한 아키텍처를 지원할 방법을 찾기 위해 끊임없이 고심해야 한다면 혁신에 쏟을 시간이 거의 없기 때문입니다. DIRT는 비일회성이라서 반복적으로(R) 일어납니다. 마치 세금(T)처럼 한 번만 내면 해결할 수 있는 비용이 아니기 때문입니다. 오히려 그 반대입니다. 새로운 프로젝트가 있으면 여러 다른 팀이 관리해야 하는 구성요소, 프레임워크 및 프로토콜이 늘어나기 때문에 DIRT가 발생하여 새로운 프로젝트의 어려움이 더욱 커지기 마련입니다. 돌이켜 보면, 기술 임원들은 분명히 이러한 비용을 인지하고 데이터 아키텍처에서 얼마나 발생하는지, 혹은 얼마나 절감할 수 있는지 파악하려는 노력을 시작했을 것이 분명합니다. 데이터는 까다롭 전략적이며 대규모에 난해하지만 최신 디지털 기업에 없어서는 안 될 핵심 요소입니다. 최신 애플리케이션은 불과 10년 전에 개발했던 애플리케이션과 비교해 봐도 데이터 요건이 훨씬 더 정교해졌습니다. 데이터가 늘어난 것도 분명한 사실이지만 복잡성은 더욱 커졌습니다. 기업은 데이터에서 보내는 신호에 보다 빠르고 현명하게 대응해야 합니다. 하지만 경직되고 비효율적인 단일 모델이면서 프로그래밍이 어려운 관계형 데이터베이스를 포함하고 있는 기존 기술은 이러한 기대에 부응하지 못합니다. 제가 2020년 MongoDB에 입사한 이후 지금까지 300회 넘게 최고 임원들과 대화를 나누었지만 이 문제를 언급한 CTO는 손가락으로 꼽을 만큼도 안 됩니다. 엔지니어링 팀이 기술 스택에서 새로운 애플리케이션의 요건을 처리하지 못하면 필요한 작업(시계열, 텍스트, 그래프 등)에 따라 단일 목적 데이터베이스를 추가하는 경우가 많습니다. 그런 다음 연속되는 파이프라인을 구축하여 데이터를 이리저리 마이그레이션합니다. 그 결과, 모든 것이 느리고 복잡해질 뿐만 아니라 정치적이 되기도 합니다. 하지만 이제, 복잡한 LinkedIn 프로필을 다듬을 때가 되었습니다. 자주 눈에 띄지 않는다면 무시하고 넘어갈 수도 있습니다. 하지만 대기업들은 수백 개에서 수천 개에 이르는 애플리케이션을 사용할 뿐만 아니라 각 애플리케이션마다 데이터 소스나 파이프라인이 다를 수도 있습니다. 시간이 지나 데이터 스토어와 파이프라인이 곱절로 늘어나면 기업의 데이터 아키텍처는 마치 복잡하게 얽힌 스파게티 뭉치처럼 보이기 시작합니다. 결국 얼마 지나지 않아 ETL, ELT, 스트리밍 등 완전한 미들웨어 계층을 운영하면서 유지보수까지 해야 합니다. 프레임워크와 프로토콜, 그리고 간혹 언어까지 다른 기술의 다양성은 개발자 협업을 더욱 어렵게 만드는 원인입니다. 모든 아키텍처가 맞춤 설계로 인해 취약하고 불안정하기 때문에 확장은 더더욱 어렵습니다. 개발자는 기업과 고객이 원하는 새로운 애플리케이션과 기능을 개발하지 못하고 통합 작업에 매달리느라 소중한 '워크플로' 시간을 허비합니다. 기업 아키텍트는 결국 잘못된 일을 해결하는 데 시간을 보낼 때가 많아지는 것입니다. 저는 대부분의 고객은 새로운 데이터 아키텍처 접근 방식을 도입할 준비가 되어 있다고 생각합니다. 제 업무 중에서 가장 좋은 점은 다른 최고 임원들의 얘기를 들으면서 정보를 얻을 수 있다는 것입니다. 팬데믹으로 인해 직접 만나서 얘기를 들을 수 없게 된 후 MongoDB가 이러한 기회를 온라인으로 옮겨 기술 임원들을 초대한 덕분에 온라인에서 일대일로, 혹은 그룹으로 만나 가장 커다란 문제가 무엇인지 터놓고 대화를 나눌 수 있습니다. 이러한 온라인 세션에서 한 CTO는 이렇게 말했습니다. “CFO의 대차대조표에 기술 부채도 들어가야 합니다.” Zoom에서도 이러한 발언은 분명히 설득력이 있었습니다. 저희는 또한 잘 알려진 일부 벤처 캐피탈 회사가 데이터 아키텍처를 설명한 슬라이드 데크도 살펴보기 시작했습니다. 벤처 캐피탈 회사는 자사의 포트폴리오 회사들을 각각 미래의 데이터 아키텍처 분야에서 유력한 업체로 반드시 포지셔닝해야 합니다. 하지만 전반적인 비전은 강렬하지 못했습니다. 한 기술 임원은 “스무 가지 신기술을 보면서 더 배워야 한다고 느꼈습니다. 정말 엄청나더군요.”라고 말했습니다. 다른 임원들도 이런 아키텍처 다이어그램을 보는 것만으로도 약간 당황스럽다고 밝혔습니다. 자사의 데이터 아키텍처가 그 정도로 복잡하다는 사실을 이미 알고 있었기 때문입니다. 대부분 데이터 아키텍처의 간소화 필요성을 알고 있지만 너무 버거운 작업이다 보니 기약 없이 미루고 있었습니다. 저는 최근에 대형 헬스케어 회사를 만났습니다. 이 회사의 임원들은 데이터 아키텍처 간소화를 어렵게 생각했지만, 과감히 작업에 착수한 결과 이제는 간소화는 반드시 필요할 뿐만 아니라 복잡하게 얽힌 데이터 아키텍처를 풀어가는 과정에서 많은 정보를 습득할 수 있다는 사실을 알게 되었다고 했습니다. 대부분 경우 혁신세는 새로운 기술을 생각조차 못하는 무능력에서 발현됩니다. 이는 기본적인 아키텍처가 너무 복잡해서 유지보수에 어려움을 겪다 보니 설상가상으로 잘 알지도 못해 바꿀 생각도 못하기 때문입니다. 수많은 대기업 임원들이 혁신의 둑을 손가락으로 막은 채 앉아서 은퇴만 기다리고 있는 이유가 바로 이것입니다. 자신들은 현대화할 수 없다고 생각하는 것입니다. MongoDB가 어떻게 이러한 문제를 해결했는지 들어보면 놀랄 일도 아닙니다. 범용 데이터베이스도 모든 유형의 데이터를 빠르게 대규모로 처리할 수 있기 때문입니다. 지금부터 자세하게 알려드리겠습니다. 저는 35년 동안 데이터베이스를 전문적으로 다루는 일을 하다가 한 가지 이유로 MongoDB에 입사했습니다. 제가 최소 30년 동안 만들고 싶었던 데이터베이스 및 애플리케이션 개발 환경을 MongoDB에서 구축할 수 있다는 확신이 생긴 것입니다. 이제 MongoDB의 비전은 데이터베이스를 넘어 더욱 광범위한 다양한 목적으로 사용될 뿐만 아니라, 어떤 유형의 애플리케이션이든 개발 방식을 가속화하고 간소화할 수 있는 애플리케이션 데이터 플랫폼을 향해 나아가고 있습니다. 이러한 변화는 예나 지금이나 변함없이 데이터를 통한 작업 용이성이라는 원대한 목표를 향한 거침없는 열정을 잘 보여줍니다. 우리는 데이터가 걸림돌이 아닌 혁신의 원동력으로 사용되기를 바랍니다. 그리고 마침내 기술 팀이 무분별한 기술 투자로 복잡하게 얽힌 스택을 풀고 DIRT를 제거할 수 있게 되기를 바랍니다. 그럼, 어디서부터 시작해야 할까요? 먼저 DIRT가 어떻게 기술 팀의 발목을 붙잡는지 정확하게 이해해야 합니다. 개발자들이 개발 환경의 파편화로 인해 협업에 어려움을 겪고 있습니까? 지원하려는 애플리케이션 변경 사항과 비교했을 때 스키마 변경 사항을 배포하는 시간이 더 오래 걸립니까? 고객을 전방위적으로 파악할 수 있는 시야를 구축하는 데 어려움이 있습니까? 만약 그렇다면 이유는 무엇일까요? 이러한 질문은 DIRT 분석을 처음 시작할 때 유용한 출발점이 됩니다. 또한 애플리케이션과 데이터 소스뿐만 아니라 데이터를 애플리케이션 데이터 플랫폼으로 마이그레이션하는 데 필요한 부분까지 주의 깊게 살펴보는 것이 좋습니다. 예를 들어 애플리케이션 객체를 비롯해 이러한 객체와 상호작용하는 애플리케이션을 전부 살펴보세요. 그런 다음에는 속성, 메소드, 컬렉션 등에 따라 각각 복잡성 점수를 할당할 수 있습니다. 이제 다시 돌아가서 각 객체에 연결되는 애플리케이션을 일일이 확인한 후 미션 크리티컬 수준, 애플리케이션 사용자 수, 애플리케이션에서 실행해야 하는 작업 수, 그리고 각 작업의 복잡성을 기준으로 등급을 매기세요. 이를 통해 모든 복잡성을 완전히 이해했다면 더욱 유리한 위치, 즉 복잡성과 통합 필요성이 가장 적은 데이터 소스부터 시작해서 계획을 세워 기존 시스템에서 벗어날 수 있습니다. 물론 측정 지표나 이점은 상황에 따라 다르지만 출발점으로는 손색이 없습니다. 그렇다고 이런 과정이 쉽다는 뜻은 아닙니다. 많은 분이 그러하듯 저는 지금까지 직무 경험을 쌓으면서 이 문제를 해결하는 데 대부분 시간을 할애해 왔습니다. 이 말은 이 문제가 어떻게 진행되는지 뿐만 아니라 기업이 DIRT를 청산할 수 있는 방법의 시발점까지 제가 잘 알고 있다는 뜻이기도 합니다. 저는 앞으로도 계속해서 이러한 당면 과제에 대해 글을 쓸 것이며 여러분에게 혜안을 드릴 수 있기를 바랍니다. DIRT에 대해 자세히 알고 싶으시면 MongoDB 백서를 다운로드 하시기 바랍니다. 늘 그렇듯 저는 여러분의 찬반 의견이나 다른 견해를 두 팔 벌려 기다립니다. @MarkLovesTech 로 트윗을 남겨주세요. 또한 marklovestech.com 에서도 MongoDB를 비롯해 그와 관련된 저의 요즘 생각들을 확인하실 수 있습니다.
Safe Software Deployments: The Goldilocks Gauge
Once upon a time, software was written to magnetic tapes or burned onto CDs and sent to customers through the mail. It was an expensive, time-consuming distribution process — and one that didn’t lend itself to updates. You either got it right or wrong. In fact, these shipments were so high-stakes that the final CD or tape was called “the golden master.” As a result, software companies would typically ship new versions of their software only every two to three years. It was a terrifying time for developers. These “Big Bang” deployments meant that one bug could cost a company millions. Imagine recutting 100,000 tapes. And a single developer could be responsible for the company not making its quarterly numbers. These deployments were too big. Today, we live in a world in which software can be continuously improved. Developers no longer have to wait years to see their work in the hands of users. In fact, some software goes into production as it's being written. Think of Eclipse hooked directly up to unit tests, integration tests, and a CI/CD pipeline. But this comes with its own set of problems. For one, this amounts to integration testing in production and therefore requires incisive instrumentation — at least if you want to see problems as they arise, or if you want the ability to back out of the new code without damaging user data. Additional complexity comes in the form of feature flags to toggle between code paths. These require more work and should be removed once a new feature is rolled out and stable. Occasionally, removing the scaffolding to support this style of continuous nano-deployment can activate unknown bugs. In my personal experience at big and small companies, this is just as bad as big bang releases. There comes a tiny unit size of deployment where the overhead of the system and the cognitive load on the teams actually increases. These deployments are too small. As you might have guessed by now, the Goldilocks Gauge is all about finding the pace and size of deployment that is just right; the perfect amount that keeps the engineering team in flow, that state where everybody is working at top productivity and any cognitive load is about the business value they are trying to produce and the complexity of the software and data needed to produce that value. How do I define that amount? It’s a quantity of innovation that is small enough to hold in your head, but large enough to deliver measurable value. Let me give an example. At one of my previous employers, we used to average about 90 deployments a week. It wasn’t enough. The tech team was more than 2,000 people, and deployment on each team was often once a quarter (or worse). As a result, code wasn’t being tried out fast enough in production, slowing down the delivery of customer value. The deployments were often so complicated that debugging required many people and many hours. That’s not what you want for a live-side app used by millions of people. You want deployments that are small enough to quickly debug, and shipped often enough that people still have all the context in their heads. Years before this, it had been even worse, with only about 10 services, deploying once per quarter or less. Getting to 90 deployments a week was a great achievement. So we can summarize that “small deployments, shipped often” is the goal. This isn’t a surprise to most of you. But, sadly, even though we now had a lot more services and most deployed regularly, the main services were still monoliths and deployed way too infrequently. And that ‘monolith’ word leads me to another problem. In addition to having deployments be small and often, you want to limit the number of people who work on each one. It’s just another kind of complexity — an even more subtle one. A monolith has lots of lines of code, lots of dependencies, and lots of people working on it. When you only have one release once per quarter, and there are 100 people working on the service, every one of those people likely has multiple code changes in there. The complexity builds and builds — and becomes larger than anybody can hold in their head. Complexity is the enemy. That complexity can be the complexity of the code itself or the complexity of the human relationships and knowledge needed to write and maintain it. Just like you want to have each piece of code depend on a small number of others, you want the same for the people in your organization. Some of you may be familiar with the Dunbar Number , which refers to the maximum number of people with whom you can establish and maintain relationships. The Dunbar number also refers to how many people are in each of your circles of friendship: there is a tight circle to whom you relate quite easily, an intermediate group that you’re still quite comfortable with, and larger groups made up of acquaintances. I’m going to take some liberties with Dunbar’s research and say that in some ways, this applies to teams of software developers as well. Teams need to be small enough to foster familiarity and maintain context, which leads to trust. Teams need to engage with units of work that are simple and easy to understand. These units need to be small enough to hold in one person’s brain, so that when they get an error, they can go right back in and fix it — or know exactly who to go to. Familiarity, trust, and small units of work create the conditions for rapid problem resolution. Of course, you then build up these small teams into larger units, all producing software in harmony — with loose coupling, but tight alignment. This is a critical part of complexity management, which also includes clean architectures and coding best practices. So what did we do? We broke the code and the databases down into smaller and smaller pieces. The teams grew the number of services by a factor of ten, from 40 to 400. And we made our teams smaller, with each team being independent but also being part of larger groups. Over the next year, we went from 90 deployments a week to more than 1,100, with each smaller team now deploying their software multiple times a week. We increased the velocity of innovation and reduced downtime by 75% at the same time. These deployments were just right. They were the right size, shipped at the right rate, with the right number of people involved in each one. And, just as Goldilocks was happy when she found the porridge that was just right, our engineers, product managers, and executives were happier with deployments that were just right. Because the one thing that makes everybody at a tech company happy is getting new code and features into the hands of end users faster and with less drama. Of course, the Goldilocks Gauge is not possible without the 180 Rule and Z Deployments , both of which help eliminate the fear of deployment. Combined, they help create a system of safe software deployment. I’ll be sharing the final element of this system in my next post, where I’ll explain my “Through the Looking Glass” theory of aligning your development, staging, and production environments. Of course, your systems may vary, and may even be better than what I’ve come up with. I’d love to hear about your experiences and tricks for safe deployments. Reach out to me on LinkedIn or on Twitter at @MarkLovesTech . You can also see a collection of these blogs, as well as a bunch of other things I’ve found useful, at marklovestech.com .
Engineering, Done DIRT Cheap: How an Outdated Data Architecture Becomes a Tax on Innovation
In March 2021, I wrote about The Innovation Tax : the idea that clunky processes and outdated technologies make it harder for engineering teams to produce excellent tech that delights customers. In the months since then, my thinking has evolved even further. I couldn’t have guessed how many technology leaders would immediately recognize these problems in their own organizations and share their own deep frustrations with me. This article puts that evolved thought together with the massive feedback that piece received. It will give you actionable ways to decrease your tax burden — and who wouldn’t want that? The innovation tax, like income tax, is real. Of course, it saps morale (with resulting attrition and churn), but it also has other financial and opportunity costs. Taxed organizations see their pace of innovation suffer as people and resources are locked into maintaining rather than innovating. We named this tax DIRT . Why? Well, it’s rooted in data (D), because it so often springs from the difficulty of using legacy databases to support modern applications that require access to real-time data to create rich user experiences. It affects innovation (I), because your teams have little time to innovate if they’re constantly trying to figure out how to support a complex and rickety architecture. It’s recurring (R), because it’s not as if you pay the tax (T) once and get it over with. Quite the opposite. DIRT makes each new project ever more difficult because it introduces so many components, frameworks, and protocols that need to be managed by different teams of people. In retrospect, it’s clear that technology leaders would recognize this tax and immediately grasp the degree to which it’s caused -- or cured -- by their data architecture. Data is sticky, strategic, heavy, intricate -- and the core of the modern digital company. Modern applications have much more sophisticated data requirements than the applications we were building only 10 years ago. Obviously, there is more data, but it’s more complicated than that: Companies are expected to react more quickly and more cleverly to all of the signals in that data. Legacy technologies, including single-model rigid, inefficient, and hard-to-program relational databases, just don’t cut it. In over 300 CxO conversations I've had since joining MongoDB in 2020, fewer than a handful of CTOs disputed this statement. When your tech stack can’t handle the demands of new applications, engineering teams will often bolt on single-purpose niche databases to do the job (think time series, text, graph, etc.). Then they’ll build a series of pipelines to move data back and forth. And everything will get slow and complicated — and even political. Time to polish up that LinkedIn profile. If this were rare, it wouldn’t be such a big deal. But large enterprises can have hundreds or thousands of applications, each with their own sources of data and their own pipelines. Over time, as data stores and pipelines multiply, an organization’s data architecture starts to look like a plate of spaghetti. Soon you’re operating and maintaining an entire middleware layer of ETL, ELT, and streaming. The variety of technologies, each with their own frameworks, protocols, and sometimes languages, makes it harder for developers to collaborate. It makes it extremely difficult to scale, because every architecture is bespoke and brittle. Developers spend their precious “flow” hours doing integration work instead of building new applications and features that the business needs and customers will love. Enterprise architects often end up spending their time on all the wrong things. It’s clear to me that most customers are ready for a new approach to data architecture. One of the best parts of my job is listening to and learning from other CxOs. Since the pandemic made it impossible to do that in person, MongoDB moved these discussions online, inviting technology leaders to hash out some of their biggest problems 1:1 and in groups with me. In one of those sessions, a CTO commented, “Technical debt should be carried on your CFO's balance sheet.” Even on Zoom, the power of that statement was clear. We also started looking at slide decks about data architecture from some of the best-known venture capital firms. Certainly VCs must position each of their portfolio companies as a critical player in the data architecture of the future. But the overall vision was not compelling. One technology leader said, “When I look at 20 net-new technologies I need to learn, it’s terrifying.” Others commented that just looking at these architecture diagrams was a little off-putting, because they knew their own organization’s data architecture was at least that complicated already. They knew they needed to simplify their data architecture, but more than one admitted to postponing this work -- indefinitely -- because it was just too daunting. I recently met with a major health care company whose executives think it’s just barely possible, but they are bravely diving in anyway, knowing that they must do it and that they’ll learn along the way as they tear down their monoliths. In many cases, the innovation tax manifests as the inability to even consider new technology because the underlying architecture is too complex and difficult to maintain, much less understand and transform. This is why a lot of senior people at enterprise companies are sitting with their fingers in the transformation dike, waiting for retirement -- they think they can’t modernize. It won’t surprise you that we also saw how MongoDB, as a general purpose database able to handle all types of data at speed and scale, could help solve this problem. Let me be clear. I’ve been working on or with databases for my entire 35-year career, and I joined MongoDB for a reason: I believe we can build the database and application-building environment that I’ve wanted to create and use for at least 30 of those years. Our vision of MongoDB goes beyond our namesake database to a broader, more versatile data platform that allows you to accelerate and simplify how you build any type of application. It represents significant progress toward our larger goal, which remains the same as ever: to make data stunningly easy to work with. We want to see data become an enabler of innovation, not a blocker. And we want to finally allow technology teams to start to untangle their sprawl and get rid of their DIRT. Where to start? It’s good to have a better understanding of just how DIRT might be holding your teams back. Do your developers have trouble collaborating because the development environment is so fragmented? Do schema changes take longer to roll out than the application changes they’re designed to support? Do you have trouble building 360-degree views of your customers? And if so, why? These are all good places to start digging in the DIRT. You might also take a hard look at your applications and data sources, as well as what it would take to move your data onto a new data platform. That could mean identifying the objects in your applications and all the applications that interact with them. You could then assign a complexity score to each one based on attributes such as properties, methods, collections, and attributes. Now take a step back and identify each application that connects to each of those objects and rank it based on how mission-critical it is, how many people rely on it, how many tasks it has to perform, and the complexity of those tasks. Once you have a better handle on all this complexity, you’ll be better positioned to create a plan to move off your legacy systems, perhaps starting with the least complex and least integrated data sources. Of course, your metrics and your mileage will vary, but the point is to start. I don’t pretend any of this is easy. Like many of you, I’ve spent most of my career working on problems just like these. But that also means I know progress when I see it, and the beginning of a way for organizations to start to clean up their DIRT. I’ll be continuing to write more about these challenges and hopefully continue to add some perspective. If you’re curious to learn more about DIRT, you can download our white paper . As always, I’m eager to have you tweet your alignment, lack thereof, or other thoughts at @MarkLovesTech . You can also reach out to me on marklovestech.com , where you will find a compilation of my latest musings related to MongoDB and otherwise.
Safe Software Deployments: Z Deployments
If you’ve gotten this far in my Safe Software Deployment series, you know how scary deployment day can be. Sleepless nights. Knots in the stomach. Cold sweats. These are the symptoms of uncertainty. And three decades of experience have taught me that all the positive thinking in the world won’t ensure a bug-free deployment. That’s why I’ve developed a number of techniques that can consistently help teams minimize fear and achieve safe software deployment. In the last post, we discussed the 180 Rule . The purpose of this post is to explain how you can use “Z Deployments” to mitigate both fear and downtime. In future posts, we’ll look at both the Goldilocks Gauge and Through the Looking Glass. Z Deployments are more than a catchy name. This is all about failed rollbacks, which in my experience are the biggest source of downtime in any software deployment pipeline. Now, we all try our best to eliminate the need for rollbacks in the first place - but when they do happen, we want them to be successful. However, in most companies, rollbacks are only tested in Prod, not in the prior stages of the pipeline. Even if you use the 180 Rule, which encourages quick and automated rollbacks, you don’t have any more certainty that they will work. This is where Z Deployments come in. With a Z Deployment, the goal is to make rollbacks just as predictable and reliable as your normal “roll forward” software deployments. I call this technique a Z Deployment, because if you chart out the process, it looks like a Z. But you can also think of Z Deployments as akin to pressing “Command Z” on your keyboard: undo. Fast, simple, no drama. Here’s how it works. Roll your code forward from development into staging. In staging, do your canary testing. Then roll back into development. Do your canary testing again. If it doesn’t work, then you just proved that your rollback code was faulty in some way. Roll your code forward into staging again, and do your full testing. If it’s successful, roll your code forward into production. Of course, this only works if your staging environment is clean and your team trusts it. I’ll get into this more in a future post called “Through the Looking Glass.” But the bottom line is that developers need to know that things will work in production; including any needed rollbacks. And the only way to do that is to test rollbacks in staging. Your version of canary tests and full tests might be different - in a perfect world you’d run full tests three full times, but often build systems aren’t set up to do that quickly enough. Too often, staging is not clean. But generally, when developers deploy to staging, their added functionality tends to work. Everyone else is using staging, and their functionality is working, too. This is the “Happy Path” - where engineers test that their new thing works. That sounds great. But what else happens? Adjacent things get broken. Often when you roll back, you’re not necessarily returning to your system’s original state, either for your own software change or for the adjacent software components. Your rollback code has to undo all the state changes your deployment to staging (or prod) may have made. Otherwise, the staging environment becomes polluted, and the results in staging won’t match the results in production. Developers lose faith in staging, and deployment again becomes a terrifying ordeal. I used to work with someone who was absolutely obsessive about staging. He ran testing, and he refused to have a long-term staging environment. Instead, his team blew away staging every month and rebuilt it from scratch. Did I like this? Absolutely. Did it work? Yes. Developers trusted staging, which meant that deployments to prod were less scary. The next step of safe software deployment is to embrace the Goldilocks Gauge, which helps make deployments routine and even boring – in a good way. It also makes both the 180 Rule and Z Deployments easier to execute, and it’s a necessity for teams working toward continuous development. In the meantime, feel free to share your own techniques for safe deployments at @MarkLovesTech .
안전한 소프트웨어 배포: 프로덕션 환경 배포와 관련된 두려움과 혐오 극복하기
저는 지금껏 제 직책 덕분에 다양한 유형의 소프트웨어를 배포할 수 있는 특권을 누렸습니다. CD를 발송했고, 웹을 통해 고객 소프트웨어를 배포했으며, 데이터베이스 인스턴스와 제어 평면을 업데이트했습니다. 그리고 실행 중인 대규모 미션 크리티컬 시스템을 실시간으로 업데이트했습니다. 제가 이것을 특권이라고 부르는 이유는 최종 사용자에게 소프트웨어를 제공하는 것이 소프트웨어 엔지니어가 가장 좋아하는 일이기 때문입니다. 그러나 배포가 게임처럼 재미있기만 한 것은 아닙니다. 그리고 배포를 할 때마다 고유한 문제가 발생하지만, 모든 배포 과정에서 한 가지 공통된 것이 있는데, 바로 두려움입니다. 중요한 소프트웨어의 배포를 담당하는 분이라면 제가 무슨 말을 하는지 잘 아실 것입니다. 배포 담당자는 소프트웨어를 개발하고, 준비하고, 테스트합니다. 그리고 마침내 소프트웨어가 출항하는 날이 오면 프로덕션 환경이라는 바다에서 순조롭게 항해할 수 있기를 바라고 또 기도합니다. 대부분의 회사에서 프로덕션 환경은 개발 및 스테이징 환경과 현저히 다르기 때문에 스테이징 환경에서 작동한 코드가 프로덕션 환경에서도 성공적으로 작동할 것인지는 알 길이 없습니다. 그러나 한 가지 분명한 점은 소프트웨어에 문제가 발생할 경우 모두가 이에 대해 알게 된다는 것입니다. 그래서 두려운 것입니다. 이러한 두려움이 개발자에게 미치는 영향을 가장 잘 이해할 수 있는 말이 있습니다. SF 소설 Dune의 저자인 Frank Herbert는 "두려움은 정신을 집어 삼킨다"고 했습니다. 두려움은 실험적이고 도전적인 정신을 약화시킵니다. 위험을 감수할 의지를 꺾고, 배포를 몇 달씩 미루는 등 나쁜 습관을 가져옵니다. 무엇보다 혁신의 속도를 느리게 만듭니다 (많은 기업들이 지불하고 있는 혁신세에 대한 게시물 참조). 프로덕션 환경에 배포하는 것는 분명 두려운 일입니다. 하지만 저는 지난 30년간 동료들과 협력하여 안전하고 자신 있는 배포 환경을 만들 수 있는 몇 가지 방법을 개발했습니다. 다음에 나오는 이 시리즈의 4개 블로그 게시물에서 각각에 대해 차례로 살펴보겠습니다. · 180 규칙 - 쉽고 빠르게 롤백이 가능한 자동화된 배포 지원 · Z 배포 - 롤백 실패로 인한 다운타임 제한 · Goldilocks Gauge - 배포의 규모와 빈도를 적절하게 조정 . 거울을 통한 조율 - 개발 환경, 스테이징 환경 및 프로덕션 환경 간의 조율 이러한 방법들은 완벽하지 않으며 배포에 버그가 발생하지 않는다는 것을 보장하지 않습니다. 하지만 제 경험상 최고의 전략입니다. 그리고 의미 있는 혁신이 가능하도록 엔지니어링 팀 내에 자신감 있는 문화를 구축하는 데 도움이 됩니다. 시작을 위해 다음 블로그 게시물에서는 프로덕션 환경에서의 다운타임(분)을 줄이는 데 도움이 되는 "180가지 규칙"에 대해 소개하겠습니다. 그동안 @MarkLovesTech 를 통해 안전한 배포를 위한 나만의 팁과 기법을 자유롭게 공유해보세요.
Safe Software Deployments: The 180 Rule
In my last post , I talked about the anxiety developers feel when they deploy software, and the negative impact that fear has on innovation. Today, I’m offering the first of four methods I’ve used to help teams overcome that fear: The 180 Rule. Developers need to be able to get software into production, and if it doesn’t work, back it out of production as quickly as possible and return the system to its prior working state. If they have confidence that they can detect problems and fix them, they can feel more confident about deploying. All deployments have the same overall stages: Deployment: You roll the software from staging to production, either in pieces -- by directing more and more transactions to it -- or by flipping a switch. This involves getting binaries or configuration files reliably to production and having the system start using them. Monitoring: How does the system behave under live load? Do we have signals that the software is behaving correctly and performantly? It’s essential that this monitoring focuses more on the existing functionality than just the “Happy Path” of the new functionality. In other words, did we damage the system through the rollout? Rollback: If there is any hint that the system is not working correctly, the change needs to be quickly rolled back from production. In a sense, a rollback is a kind of deployment, because you’re making another change to the live system: returning it to a prior state. The “180” in the name of the rule has a double meaning. Of course, we’re referring here to the “180 degree” about-face of a rollback. But it’s also a reference to an achievable goal of any deployment. I believe that any environment should be able to deploy software to production and roll it back if it doesn’t work in three minutes, or 180 seconds. This gives 60 seconds to roll binaries to the fleet and point your customers to them, 60 seconds to see if the transaction loads or your canaries see problems, and then 60 seconds to roll back the binaries or configurations if needed. Of course, in your industry or for your product, you might need this to be shorter. But the bottom line is that a failed software deployment should not live in production for more than three minutes. Developers follow these three stages all the time, and they often do it manually. I know what you’re thinking: “How can any human being deploy, monitor, and roll back software that fast?” And that is the hidden beauty of the 180 Rule. The only way to meet this requirement is by automating the process. Instead of making the decisions, we must teach the computers how to gather the information and make the decisions themselves. Sadly, this is a fundamental change for many companies. But it’s a necessary change. Because the alternative is hoping things will work while fearing that they will not. And that makes developers loath to deploy software. Sure, there are a lot of tools out there that help with deployments. But this is not an off-the-shelf, set-it-and-forget-it scenario. You, as the developer, must provide those tools with the right metrics to monitor and the right scripts to both deploy the software and possibly roll it back. The 180 Rule does not specify which tools to use. Instead it forces developers to create rigorous scripts and metrics, and ensure they can reliably detect and fix problems quickly. There’s a gotcha that many of you are thinking of: The 180 Rule is not applicable if the deployment is not reversible. For example, deploying a refactored relational schema can be a big problem, because a new schema might introduce information loss that prevents a roll-back. Or the deployment might delete some old config files that aren’t used by the new software. I’ll talk more about how to avoid wicked problems like these in my subsequent posts. But for now, I’m interested to hear what you think of The 180 Rule, and whether you’re using any similar heuristics in your approach to safe deployment.
Safe Software Deployments: Overcoming the Fear and Loathing of Pushing to Prod
Over the course of my career, I’ve had the privilege of deploying many different types of software. I’ve shipped CDs. I’ve pushed customer software over the web. I’ve updated database instances and control planes. And I’ve live-updated large, running, mission-critical systems. I call this a privilege because getting software into the hands of end users is what software engineers love most. But deployments are not all fun and games. And while each deployment presents its own unique challenges, there is one thing they all have in common: fear. Those of you responsible for significant software deployments know exactly what I’m talking about. You work, you prepare, you test. But when the day finally comes for your software to set sail, you are left hoping and praying it proves seaworthy on the Ocean of Production. In most companies, production is so different from your development and staging environments, that it’s almost impossible to know whether the code that worked in staging is going to succeed in production. Yet one thing is certain: if your software fails, everybody is going to know about it. Hence the fear. When it comes to understanding the effects of fear on the developer, I think Frank Herbert, author of the epic science-fiction saga Dune, said it best: “Fear is the mind-killer.” Fear undermines experimentation and the entrepreneurial spirit. It discourages risk-taking and leads to bad habits, like avoiding deployment for months. And worst of all, fear slows down the innovation process. (See my post on the Innovation Tax many organizations are paying, and don’t know it.) Pushing to production is undeniably scary. But over the last 30 years, working with my peers, I’ve developed a few methods for creating the conditions for safe, confident deployments. And my next four blogs in this series will unpack each of them in turn: The 180 Rule - Enabling fast, automated, easily reversible deployments Z Deployments - Limiting downtime from failed rollbacks The Goldilocks Gauge - Making the size and frequency of deployments just right Through the Looking Glass - Ensuring alignment between Dev, Stage, and Prod environments These methodologies aren’t perfect and they won’t guarantee you a bug-free deployment. But they’re the best practices I’ve seen. And they help create a culture of confidence within an engineering team, which is the foundation of meaningful innovation. To get started, my next blog will explain the “180 Rule” to help you reduce outage minutes in production. In the meantime, feel free to share your own tips and techniques for safe deployments with @MarkLovesTech .
The Rise of the Strategic Developer
The work of developers is sometimes seen as tactical in nature. In other words, developers are not often asked to produce strategy. Rather, they are expected to execute against strategy, manifesting digital experiences that are defined by the “business.” But that is changing. With the automation of many time-consuming tasks -- from database administration to coding itself -- developers are now able to spend more time on higher value work, like understanding market needs or identifying strategic problems to solve. And just as the value of their work increases, so too does the value of their opinions. As a result, many developers are evolving, from coders with their heads-down in the corporate trenches to highly strategic visionaries of the digital experiences that define brands. “I think the very definition of ‘developer’ is expanding,” says Stephen “Stennie” Steneker, an engineering manager on the Developer Relations team at MongoDB. “It’s not just programmers anymore. It’s anyone who builds something.” Stennie notes that the learning curve needed to build something is flattening. Fast. He points to an emerging category of low code tools like Zapier, which allows people to stitch web apps together without having to write scripts or set up APIs. “People with no formal software engineering experience can build complex automated workflows to solve business problems. That’s a strategic developer.” Many other traditional developer tasks are being automated as well. At MongoDB, for example, we pride ourselves on removing the most time-consuming, low-value work of database administration. And of course, services like GitHub Copilot are automating the act of coding itself. So what does this all mean for developers? A few things: First, move to higher ground. In describing one of the potential outcomes of GitHub Copilot, Microsoft CTO Kevin Scott said, ““It may very well be one of those things that makes programming itself more approachable.” When the barriers to entry for a particular line of work start falling, standing still is not an option. It’s time to up your strategic game by offering insight and suggestions on new digital experiences that advance the objectives of the business. Second, accept more responsibility. A strategic developer is someone who can conceive, articulate, and execute an idea. That also means you are accountable for the success or failure of that idea. And as Stennie reminded me, “There are more ways than ever before to measure the success of a developer’s work.” And third, never stop skilling. Developers with narrow or limited skill sets will never add strategic value, and they will always be vulnerable to replacement. Like software itself, developers need to constantly evolve and improve, expanding both hard and soft skills. How do you see the role of the developer evolving? Any advice for those that aspire to more strategic roles within their organizations? Reach out and let me know what you think at @MarkLovesTech .
4 Common Misperceptions about MongoDB
One year ago, in the middle of the pandemic, Dev Ittycheria, the CEO of MongoDB, brought me on as Chief Technology Officer. Frankly, I thought I knew everything about databases and MongoDB. After all, I’d been in the database business for 32 years already. I’d been on MongoDB’s Board of Directors and used the products extensively. And of course I’d done my due diligence, met the leadership team, and analyzed earnings reports and product roadmaps. Even with all that knowledge, this past year as MongoDB’s CTO has taught me that many of my preconceived notions were just plain wrong. This made me wonder how many other people might also have the wrong impression about this company. And this blog is my attempt to set those perceptions straight by sharing my four major revelations of the last year. My first revelation is that MongoDB is not trying to become this generation’s relational database. For years I assumed that MongoDB basically wanted to be a better, more modern version of Oracle when it grew up. In other words, compete with the huge footprint of Oracle and other commercial RDBMSs that have been the industry archetype for so long. I was way off. The whole point of MongoDB is to leave all those forms of archaic, legacy database technology in the historical dust. This was never supposed to be an evolution, but instead a revolution. Our founders not only envisioned the world's fastest and most scalable persistent store, but also one that would be programmed and operated differently. The combination of embedded documents and structures combined with automatic high availability and almost-infinite distribution capability all add up to a fundamentally different way of working with data, building applications, and running those applications in production. Oracle and (SQL*Server, etc) still hang their hats on E.F. Codd’s 51-year old vision of rows and columns. To obtain high availability and distribution of data, you need add ons, options packages, baling wire and duct tape. And you need a lot of database administrators. Not cheap. Even after all that, you’re still trailing the technological edge. This is how wrong I was. Our durable competitive advantages over these legacy data stores make competing with those products almost irrelevant. We instead focus on the modern needs of modern developers building modern applications. These developers need to create their own competitive advantage through language-native development, reliable deployments to production, and lightning fast iteration. And the world is noticing; just check out the falling slope of Oracle and SQL*Server and the rising slope of MongoDB on the db-engines website. Which brings me to my second revelation: MongoDB was built for developers, by developers. I always knew that MongoDB was exceedingly fast and easy to program against. One time while I was bored in a meeting (yes, it happens here as well!), I built an Atlas database, loaded it with 350MB of data, downloaded and learned our Compass data discovery tool, built-in analytics aggregation pipelines, and our Charts package, and embedded live charts in a web page. This took me all of 19 minutes, end to end. To build something like that for engineers , it just has to be built by engineers , ones that are free to focus on all the rough edges that creep into products as features are added. I was first exposed to software planning and management over 40 years ago, and my LinkedIn profile shows a pretty diverse tour around the industry. Now, one year in, I can emphatically state that engineering and product at MongoDB are both different and better than any company I’ve ever had the privilege to work at. Our executive leadership gives engineering and product broad brushstokes of goals and desired outcomes, and then we work together to come up with detailed roadmaps, updated quarterly, that meet those goals in the way we think best, with no micromanagement. And we’re not afraid of 3-5 year projects, either. For example, multi-cloud was more than three years in the making. Also unlike any other company I’ve been at, we embrace the creation and re-payment of tech debt, rather than sweeping it under the rug. We do this through giving our product and engineering teams huge amounts of context, delivered with candor and openness. And one more essential thing; we have an empowered program management team that improves processes (including killing them) as fast as we create them. In short, we paint the targets for our teams and let them decide how and when to shoot. They even design the arrows and bows. It’s true bottoms-up engineering. Our engineers feel valued and understood. And that, in turn, empowers them to develop features that make our customers feel valued and understood, like a unified query language, or real-time analytics and charting directly in the console, or multi-region/multi-cloud clusters where all the networking cruft is taken care of for you. And this brings me to my third revelation: MongoDB is built for even the most demanding mission critical applications. Fast? Yes. Easy? Of course. But mission-critical? That’s not how I saw MongoDB when I used Version 2 for a massive student data project 10 years ago. While it was the only possible datastore we could have chosen for the amount of data and the speed of ingestion and processing needed, it was pretty hard to set up and use in a 24 x 365 environment. MongoDB had gotten ahead of itself in the early 2010’s. There was a gap between our capabilities and the expectations of the market. And it was painful. Other databases had had more than 30 years to solidify their systems and operations. We’d had five. But with Version 3 we added a new storage engine, full ACID transactions, and search. We built on it with Version 4. And then again with Version 5, released this week at our .Live conference. I knew about all this progress intellectually of course when I joined, but not viscerally. I came to realize that the security, durability, availability, scalability, and operability our platform offers (of course in addition to all the features that developers love too) was ideal for architecting fast-moving enterprise applications. And I found the proof in our customer list. It reads like a Who’s Who of major global banks, retailers, and telecommunications companies, running core systems like payments, IoT applications, content management, and real-time analytics. They use our database, data lake, analytics, search, and mobile products across their entire businesses, in every major cloud, on-premises, and on their laptops. And that leads me to my fourth and final revelation. MongoDB is no longer just a database. Of course, the database is still the core. But MongoDB now provides an enterprise-class, mission-critical data platform. A cohesive, integrated suite of offerings capable of managing modern data requirements across even the most sprawling digital estates, and scaling to meet the level of any company’s ambition, without sacrificing speed or security. Since the day I was first introduced to MongoDB’s products, I’ve had tremendous respect and admiration for the teams and their work. After all, I’m a developer, first and foremost. And it always felt like they “got” me. But had I known then what I know now, I would have jumped on this train a long time ago. In fact, I might have camped out on their doorstep with my resume in hand. And who knows? Maybe a bunch of people reading this will do just that, and have their own revelations about how fulfilling and exciting it can be to be at a great company, with a great culture, producing great products. I’ll write another letter a year from now, and let you know how it’s going then. In the meantime, please reach out to me here, or at @MarkLovesTech .
몽고DB에 대한 4가지 오해
일년 전 코로나19가 한창일 때 몽고DB CEO 데브 이티체리아가 나를 CTO(최고 기술 책임자)로 채용했다. 솔직히 나는 데이터베이스와 몽고DB에 대해 이미 모든 것을 알고 있다고 생각했다. 데이터 베이스 업계에서 일한지도 벌써 32년이나 됐고, 몽고DB 이사회에 참여하면서 몽고DB 제품을 두루 사용해왔기 때문이다. 물론 실사도 진행했고 경영진과 미팅을 가졌으며 실적 보고서와 제품 로드맵을 분석해왔다. 이렇게 지식을 쌓았음에도 불구하고 지난 한 해 동안 몽고DB에서 CTO로 일하면서 나는 내가 얼마나 잘못된 선입견을 갖고 있었는지를 깨닫게 됐다. 그리고 나처럼 몽고DB에 대해 오해하고 있는 사람이 많이 있을 것이라는 생각에 이르렀다. 이에 지난 일 년간 내가 알게 된 4가지 사실을 이 블로그를 통해 공유하여 사람들의 인식을 바로잡고자 한다. 첫째, 몽고DB의 비전은 현 세대의 관계형 데이터베이스가 되는 것이 아니라는 것이다. 나는 수년 전부터 몽고DB가 오라클보다 한층 개선된 최신 버전을 추구한다고 생각해왔다. 다시 말해, 몽고DB가 오라클의 광범위한 입지와 더불어 업계 전형으로 오랜 기간 군림해왔던 여타 상용 RDBMS와 경쟁하려는 듯 보였다. 하지만 이런 내 추측은 완전히 벗겨 갔다. 몽고DB의 궁극적인 목표는 모든 유형의 구식 레거시 데이터 베이스 기술에서 완전히 탈피하는 것이었다. 그리고 이러한 목표를 진화가 아닌 혁명으로 달성하고자 했다. 몽고DB의 설립자들은 세계 최고의 속도와 확정성을 갖춘 퍼시스턴트 저장소에서 한 걸음 더 나아가, 다른 방식으로 프로그래밍 및 운영이 가능한 저장소를 꿈꿨다. 내장된 문서와 구조가 고가용성을 갖춘 무제한에 가까운 자동 배포 기능과 결합하면 데이터 작업, 애플리케이션 구축, 그리고 프로덕션 단계에서의 애플리케이션 실행 방식이 근본적으로 달라지게 된다. 오라클과 SQL*서버 등은 아직도 열과 행이라는 51년 된 E.F 코드의 비전에 의존하고 있다. 데이터의 고가용성과 배포 능력을 확보하기 위해서는 추가 기능을 비롯해 옵션 패키지, 베일링 와이어, 그리고 덕트 테이프가 필요하다. 여기에 데이터베이스 관리자도 상당수 투입해야 하기에 만만치 않은 비용이 들게 된다. 그럼에도 불구하고 많은 이들이 여전히 기술적 우위를 추구하고 있다. 나도 이렇게 잘못 생각하고 있었다. 레거시 데이터 저장소를 능가하는 몽고DB의 지속 가능한 경쟁 우위를 고려할 때, 레거시 데이터 저장소 제품과 경쟁한다는 건 사실상 무의미하다. 대신 우리는 최신 애플리케이션을 구축하는 오늘날 개발자들의 최신 요구사항에 초점을 맞춘다. 지금의 개발자들은 고유 언어 개발, 프로덕션 환경으로의 안정적인 배치, 초고속 반복 작업을 통해 자신만의 경쟁 우위를 창출해야 한다. 그리고 전 세계가 이를 주목하고 있다. db 엔진 사이트 만 하더라도, 오라클과 SQL*서버는 하락세에 접어들고 있는데 반해 몽고DB는 상승세를 타고 있는 것을 알 수 있다. 여기서 내가 두 번째로 깨달은 점이 있다. 몽고DB는 개발자가 직접 개발자를 위해 만든 제품이라는 것이다. 나는 몽고DB가 굉장히 빠르고 프로그래밍 하기도 매우 쉽다는 것을 익히 알고 있었다. 어느 날은 회의가 너무 지루해서 (몽고DB에서도 이런 일은 일어난다!) 아틀라스 데이터베이스를 구축해 350MB의 데이터를 로드했다. 그리고 몽고DB의 콤파스 데이터 디스커버리 툴, 내장형 분석 집계 파이프라인, 몽고DB 차트 패키지, 웹페이지의 내장 라이브 차트를 다운로드해 공부했다. 시작부터 끝까지 총 19분이 걸렸다. 이러한 환경을 엔지니어를 위해 개발하려면 제품에 기능을 추가할 때 발생하는 소소한 오류에 자유롭게 집중할 수 있는 엔지니어들이 개발해야 한다. 40여 년 전, 소프트웨어 기획과 관리를 처음 접하게 되었는데 내 링크드인 계정을 보면 이 쪽 업계에서 아주 다양한 일들을 해왔다는 것을 알 수 있을 것이다. CTO직을 맡은 지 일년이 지난 지금, 나는 우리의 엔지니어링 기술과 제품이 내가 그 간 일했던 다른 기업들의 제품과 다르며 월등하다고 장담할 수 있다. 몽고DB의 경영진은 엔지니어링과 제품 부서에게 우리의 목표와 원하는 결과를 대략적으로 설명한 다음 협력을 통해 세부 로드맵을 마련하고, 또 로드맵을 분기별로 업데이트하며, 사소한 것까지 관리할 필요 없이 우리의 목표를 가장 잘 충족하는 로드맵을 마련한다. 3~5년짜리 프로젝트를 한다고 하더라도 두려울 게 없다. 일례로, 멀티 클라우드를 개발하는 데는 3년 이상이 걸렸다. 과거 일했던 다른 기업들과는 달리, 몽고DB는 기술 부채를 비밀로 덮어두기 보다는 기술 부채의 생성과 상환에 적극 대응하고 있다. 우리는 제품 팀과 엔지니어링 팀에게 이러한 상황 전체를 가감없이 솔직하게, 그리고 공개적으로 알린다. 한 가지 더 중요한 사실은, 프로세스를 생성하는 즉시 개선할 수 있는(프로세스 삭제 포함) 역량 있는 프로그램 관리 팀이 있다는 것이다. 간략히 말해, 우리는 실무팀에게 목표를 제시하고, 목표를 달성하는 방식과 시기를 주체적으로 결정하게 한다. 실무 팀은 이 과정에서 자신들이 사용할 툴까지도 직접 설계한다. 진정한 상향식 엔지니어링인 것이다. 몽고DB의 엔지니어들은 스스로 가치 있고, 경영진으로부터 이해 받고 있다고 느낀다. 이를 통해 엔지니어들은 고객에게도 이러한 감정을 불러일으킬 수 있는 다양한 기능을 만들 수 있다. 이를테면 통합 쿼리 언어나 콘솔 기반의 실시간 분석 및 차트 작성 또는 모든 네트워킹 크러프트를 사용자 대신 처리하는 멀티 리전/멀티 클라우드 클러스터 기능 등이 있다. 그리고 여기서 세 번째로 알게 된 사실은, 몽고DB가 가장 까다로운 미션 크리티컬 애플리케이션에도 사용할 수 있도록 설계됐다는 점이다. 속도가 빠르냐고? 그렇다. 사용 편의성은? 당연히 편리하다. 그렇다면 미션 크리티컬은? 10년 전에 진행된 대규모의 학생 데이터 프로젝트에서 버전 2를 사용할 때까지만 해도 몽고DB를 이렇게 생각하진 않았다. 몽고DB는 작업에 필요한 데이터의 양과 데이터 수집 및 처리 속도를 위해 선택할 수 있는 유일한 데이터 저장소였지만, 24시간 상시 가동 환경에서 구축하여 사용하기에는 턱없이 부족했다. 몽고DB는 2010년대 초에 자체 역량을 넘어섰는데, 우리의 역량과 시장의 기대 사이에는 차이가 있었다. 이 점이 매우 힘들었다. 다른 데이터베이스는 시스템과 운영 방식을 강화하기까지 30년 이상이 걸렸다. 하지만 몽고DB는 이를 단 5년만에 해낸 것이다. 버전 3에는 새로운 저장 엔진, 풀 ACID 트랜잭션, 그리고 검색 기능을 추가했다. 이를 기반으로 버전 4를 만들었다. 그후 버전 5가 MongoDB.Live 2021 컨퍼런스에서 정식 출시됐다.(2021년 7월) 몽고DB에 입사할 당시 나는 이러한 상황을 알고 있었지만 실질적으로 체감하진 못했다. 그러다가 몽고DB 플랫폼이 제공하는 보안, 내구성, 가용성, 확장성, 운용성(그 외에도 개발자들이 선호하는 모든 기능)이 빠르게 변화하는 엔터프라이즈 애플리케이션을 설계하는 데 이상적이라는 것을 알게 되었다. 몽고DB 고객 목록에서 그 증거를 확인할 수 있었다. 우리의 고객 목록은 마치 결제, IoT 애플리케이션, 콘텐츠 관리, 실시간 분석 등 핵심 시스템을 운영하는 주요 글로벌 은행, 소매업체, 통신사의 명사록과 같았다. 이들은 자사의 모든 사업, 모든 주요 클라우드, 사내 현장, 노트북 등에서 몽고DB 데이터베이스, 데이터 레이크, 분석, 검색, 모바일 제품을 사용하고 있다. 여기서 네번째로 알게 된 마지막 사실은, 몽고DB가 단순한 데이터베이스가 아니라는 점이다. 물론 데이터베이스가 몽고DB의 핵심 사업이긴 하다. 하지만 몽고DB는 이제 엔터프라이즈급의 미션 크리티컬 애플리케이션 데이터 플랫폼을 제공하고 있다. 최대 규모의 디지털 단지에서도 최신 데이터 요건을 관리하고 속도나 보안을 저해하지 않으면서 회사가 추구하는 수준으로 확장 가능한 통합 제품군인 것이다. 나는 몽고DB 제품을 처음 접한 날부터 몽고DB 팀과 이들이 만든 제품에 감탄을 금치못했고 엄청난 존경심을 갖게 됐다. 어쨌건 나는 누가 뭐라 해도 개발자이다. 또 항상 개발자들은 나를 “이해한다”는 느낌이 들었다. 하지만 지금 알고 있는 사실을 그때 알았더라면 진작에 몽고DB에 입사했을 것이다. 이력서를 손에 들고 몽고DB 건물 입구에서 밤을 새고 기다렸을 수도 있다. 누가 알랴? 이 글을 읽는 많은 사람들이 그렇게 할지? 훌륭한 기업 문화를 갖춘 기업에서 훌륭한 제품을 만드는 것이 얼마나 보람 있고 흥분되는 일인지 깨달으면서 말이다. 1년 후에 다시 한번 글을 올려 그 때의 상황을 전할 생각이다. 그때까지 이 포스팅에 댓글을 남기거나 MarkLovesTech 로 연락 주기 바란다.