October 20, 2021 | Updated: November 8, 2021
저는 지금껏 제 직책 덕분에 다양한 유형의 소프트웨어를 배포할 수 있는 특권을 누렸습니다. CD를 발송했고, 웹을 통해 고객 소프트웨어를 배포했으며, 데이터베이스 인스턴스와 제어 평면을 업데이트했습니다. 그리고 실행 중인 대규모 미션 크리티컬 시스템을 실시간으로 업데이트했습니다.
제가 이것을 특권이라고 부르는 이유는 최종 사용자에게 소프트웨어를 제공하는 것이 소프트웨어 엔지니어가 가장 좋아하는 일이기 때문입니다. 그러나 배포가 게임처럼 재미있기만 한 것은 아닙니다. 그리고 배포를 할 때마다 고유한 문제가 발생하지만, 모든 배포 과정에서 한 가지 공통된 것이 있는데, 바로 두려움입니다.
중요한 소프트웨어의 배포를 담당하는 분이라면 제가 무슨 말을 하는지 잘 아실 것입니다. 배포 담당자는 소프트웨어를 개발하고, 준비하고, 테스트합니다. 그리고 마침내 소프트웨어가 출항하는 날이 오면 프로덕션 환경이라는 바다에서 순조롭게 항해할 수 있기를 바라고 또 기도합니다. 대부분의 회사에서 프로덕션 환경은 개발 및 스테이징 환경과 현저히 다르기 때문에 스테이징 환경에서 작동한 코드가 프로덕션 환경에서도 성공적으로 작동할 것인지는 알 길이 없습니다. 그러나 한 가지 분명한 점은 소프트웨어에 문제가 발생할 경우 모두가 이에 대해 알게 된다는 것입니다. 그래서 두려운 것입니다.
이러한 두려움이 개발자에게 미치는 영향을 가장 잘 이해할 수 있는 말이 있습니다. SF 소설 Dune의 저자인 Frank Herbert는 "두려움은 정신을 집어 삼킨다"고 했습니다. 두려움은 실험적이고 도전적인 정신을 약화시킵니다. 위험을 감수할 의지를 꺾고, 배포를 몇 달씩 미루는 등 나쁜 습관을 가져옵니다. 무엇보다 혁신의 속도를 느리게 만듭니다 (많은 기업들이 지불하고 있는 혁신세에 대한 게시물 참조).
프로덕션 환경에 배포하는 것는 분명 두려운 일입니다. 하지만 저는 지난 30년간 동료들과 협력하여 안전하고 자신 있는 배포 환경을 만들 수 있는 몇 가지 방법을 개발했습니다. 다음에 나오는 이 시리즈의 4개 블로그 게시물에서 각각에 대해 차례로 살펴보겠습니다.
· 180 규칙 - 쉽고 빠르게 롤백이 가능한 자동화된 배포 지원
· Z 배포 - 롤백 실패로 인한 다운타임 제한
· Goldilocks Gauge - 배포의 규모와 빈도를 적절하게 조정
. 거울을 통한 조율 - 개발 환경, 스테이징 환경 및 프로덕션 환경 간의 조율
이러한 방법들은 완벽하지 않으며 배포에 버그가 발생하지 않는다는 것을 보장하지 않습니다. 하지만 제 경험상 최고의 전략입니다. 그리고 의미 있는 혁신이 가능하도록 엔지니어링 팀 내에 자신감 있는 문화를 구축하는 데 도움이 됩니다.
시작을 위해 다음 블로그 게시물에서는 프로덕션 환경에서의 다운타임(분)을 줄이는 데 도움이 되는 "180가지 규칙"에 대해 소개하겠습니다. 그동안 @MarkLovesTech를 통해 안전한 배포를 위한 나만의 팁과 기법을 자유롭게 공유해보세요.
Safe Software Deployments: The 180 Rule
In my last post , I talked about the anxiety developers feel when they deploy software, and the negative impact that fear has on innovation. Today, I’m offering the first of four methods I’ve used to help teams overcome that fear: The 180 Rule. Developers need to be able to get software into production, and if it doesn’t work, back it out of production as quickly as possible and return the system to its prior working state. If they have confidence that they can detect problems and fix them, they can feel more confident about deploying. All deployments have the same overall stages: Deployment: You roll the software from staging to production, either in pieces -- by directing more and more transactions to it -- or by flipping a switch. This involves getting binaries or configuration files reliably to production and having the system start using them. Monitoring: How does the system behave under live load? Do we have signals that the software is behaving correctly and performantly? It’s essential that this monitoring focuses more on the existing functionality than just the “Happy Path” of the new functionality. In other words, did we damage the system through the rollout? Rollback: If there is any hint that the system is not working correctly, the change needs to be quickly rolled back from production. In a sense, a rollback is a kind of deployment, because you’re making another change to the live system: returning it to a prior state. The “180” in the name of the rule has a double meaning. Of course, we’re referring here to the “180 degree” about-face of a rollback. But it’s also a reference to an achievable goal of any deployment. I believe that any environment should be able to deploy software to production and roll it back if it doesn’t work in three minutes, or 180 seconds. This gives 60 seconds to roll binaries to the fleet and point your customers to them, 60 seconds to see if the transaction loads or your canaries see problems, and then 60 seconds to roll back the binaries or configurations if needed. Of course, in your industry or for your product, you might need this to be shorter. But the bottom line is that a failed software deployment should not live in production for more than three minutes. Developers follow these three stages all the time, and they often do it manually. I know what you’re thinking: “How can any human being deploy, monitor, and roll back software that fast?” And that is the hidden beauty of the 180 Rule. The only way to meet this requirement is by automating the process. Instead of making the decisions, we must teach the computers how to gather the information and make the decisions themselves. Sadly, this is a fundamental change for many companies. But it’s a necessary change. Because the alternative is hoping things will work while fearing that they will not. And that makes developers loath to deploy software. Sure, there are a lot of tools out there that help with deployments. But this is not an off-the-shelf, set-it-and-forget-it scenario. You, as the developer, must provide those tools with the right metrics to monitor and the right scripts to both deploy the software and possibly roll it back. The 180 Rule does not specify which tools to use. Instead it forces developers to create rigorous scripts and metrics, and ensure they can reliably detect and fix problems quickly. There’s a gotcha that many of you are thinking of: The 180 Rule is not applicable if the deployment is not reversible. For example, deploying a refactored relational schema can be a big problem, because a new schema might introduce information loss that prevents a roll-back. Or the deployment might delete some old config files that aren’t used by the new software. I’ll talk more about how to avoid wicked problems like these in my subsequent posts. But for now, I’m interested to hear what you think of The 180 Rule, and whether you’re using any similar heuristics in your approach to safe deployment.
Building AI With MongoDB: Optimizing the Product Lifecycle with Real-Time Customer Data
Over the course of our Building AI with MongoDB blog post series, we’ve seen many organizations using AI to shape product development and support. Examples we’ve profiled so far include: Ventecon’s co-pilot helping product managers generate and refine specifications for new products Cognigy’s conversational AI solutions empowering businesses to provide instant and personalized customer service in any language and for any channel Kovai’s AI assistant helping users quickly discover information from product documentation and knowledge bases In this roundup of the latest AI builders, I’ll focus on three more companies innovating across the product lifecycle. We’ll start with Zetla, which helps teams prioritize product roadmaps using live customer insights and sentiment. Then I'll move on to Crewmate, which connects products to engaged communities of users. We’ll wrap with Ada, which helps product companies like Meta and Verizon better support their customers through AI-driven automation. Check out our AI resource page to learn more about building AI-powered apps with MongoDB. Zelta.AI: Prioritizing product roadmaps with data-driven customer analytics Today's digital economy means customer feedback streams into the enterprise from a multitude of physical and digital touchpoints. For product managers, it can seem an impossible task to synthesize this feedback into themes and priorities that underpin a coherent development plan everyone in the business commits to. This is the problem Zelta.ai was founded to address. Zelta uses generative AI to communicate insights on top of customer pain points found in companies’ most valuable asset: qualitative sources of customer feedback such as call transcripts and tickets, pulling directly from platforms like Gong, Zoom, Fireflies, Zendesk, Jira, Intercom, among others. Zelta leverages LLMs to process unstructured data and returns actionable insights for product teams The company’s engineering team uses a combination of fine-tuned OpenAI GPT-4, Cohere, and Anthropic models to extract, classify, and encode source data into trends and sentiment around specific topics and features. MongoDB Atlas is used as the data storage layer for source metadata and model outputs. “The flexibility MongoDB provides us has been unbelievable,” says Mick Cunningham, CTO and Co-Founder at Zelta AI. “My development team can constantly experiment with new features, just adding fields and evolving the data model as needed without any of the expensive schema migration pains imposed by relational databases.” Cunningham goes on to say, “We also make heavy use of the MongoDB aggregation pipeline for application-driven intelligence . Without having to ETL data out of MongoDB, we can analyze data in place to provide customers with real-time dashboards and reporting of trends in product feedback. This helps them make product decisions faster, making our service more valuable to them.” Looking forward, Zelta plans on creating its own custom models, and MongoDB will prove invaluable as a source of labeled data for supervised model training. Zelta is a member of the MongoDB AI Innovators program , taking advantage of free Atlas credits, access to technical support, and exposure to the wider MongoDB community. Crewmate: Helping brands connect with their communities In the digital economy, brands can spend millions of dollars growing online communities populated with highly engaged users of their products and services. However many of the tools used for building communities are third-party solutions that abstract away a brand’s visibility into user engagement. This is an issue Crewmate is working to address. Crewmate is a no-code builder for embedded AI-powered communities. The company’s builder provides customizable communities for brands to deploy directly onto their websites. Crewmate is already used today across companies in consumer packaged goods (CPG), B2B SaaS, gaming, Web3, and more. Crewmate starts by scraping a brand's website, along with open job postings and customer data from CRM systems. Scraped data is stored in its MongoDB Atlas database running on Google Cloud. An Atlas Trigger then calls OpenAI’s ada-002 embedding model, storing and indexing the vectorized encodings into Atlas Vector Search . An event-driven pipeline keeps the embeddings fresh by firing the Atlas Trigger as soon as new website data is inserted into the MongoDB database. Using context-aware semantic search powered by Atlas Vector Search, users hitting and browsing the community pages on a brand’s website are automatically served relevant content. This includes posts from social media feeds, forum discussions, job postings, special offers, and more. “I’ve used MongoDB in past projects and knew that its flexible document schema would allow me to store data of any structure. This is particularly important when ingesting many different types of data from my clients’ websites,” says Raj Thaker, CTO and Co-Founder of Crewmate. “The introduction of Atlas Vector Search and the Building Generative AI Applications tutorial gave me a fast, ready-made blueprint that brings together a database for source data, vector search for AI-powered semantic search, and reactive, real-time data pipelines to keep everything updated, all in a single platform with a single copy of the data and a unified developer API. This keeps my engineering team productive and my tech stack streamlined. Atlas also provides integrations with the fast-evolving AI ecosystem. So while today I’m using OpenAI models, I have the flexibility to easily integrate with other models, such as Llama, in the future.” Thaker goes on to say, “One of Crewmate’s major value creations is the insights brands can extract. Using the powerful and expressive MongoDB Query API I can process, aggregate, and analyze user engagement data so that brands can track community outreach efforts and conversions. They can generate this intelligence directly from their app data stored in MongoDB, avoiding the need to ETL it out into a separate data warehouse or data lake." Like Zelta, Crewmate is also part of MongoDB’s AI Innovators program . Ada: Revolutionizing customer service with AI-powered automations built on MongoDB Atlas Founded in 2016, Ada has become a leader in automating complex service interactions across any channel and modality. The company has raised close to $200 million, has 300 employees, and counts Meta, Verizon, and AT&T among its 300 customers. Mike Gozzo, Ada’s Chief Product and Technology Officer was interviewed at a recent MongoDB developer conference where he discussed the evolution of AI for customer service and the role MongoDB plays in Ada’s AI stack. Gozzo makes the point that while bots for customer service aren’t new, the huge advancements in transformer models and LLMs coupled with reinforcement learning from human feedback (RLHF) have made these assistants far more capable. Rather than just search for information, they can use advanced reasoning to solve customer problems. Asked why Ada selected MongoDB Atlas to underpin all its products, Gozzo says, “Having the flexibility and ability to just pivot on a dime was really important. We saw that as we advanced the company and brought in new channels and new modalities, having one data store that can be easily extended without crazy migrations and that would really support our needs was absolutely clear from MongoDB. We’ve always stayed the path with Atlas because the performance is there, the support from the team is great, and we believe in having less dependency on one central cloud vendor that MongoDB allows.” Gozzo goes on to say, “Using MongoDB means we’re not limited in how we source data if we want to build something new. We can query unstructured data and use it to train other models. We use generative AI effortlessly throughout our product stack to automate queries and provide support that goes beyond just answering multi-step queries. With MongoDB, we’re able to ship new products in just a few months.” Going forward, Ada is starting to use MongoDB Change Streams to build a distributed event processing system that powers bots and analytics. It is also exploring Queryable Encryption , which helps advance AI training while keeping conversations private. Getting started Check out our library of AI case studies to see the range of applications developers are building with MongoDB. Our 3-minute explainer video on Atlas Vector Search is a great way to assess what’s possible as you start on your journey to AI-powered apps.