Introducing Snapshot Distribution in MongoDB Atlas
Data is at the heart of everything we do and in today’s digital economy has become an organization's most valuable asset. But sometimes the lengths that need to be taken to protect that data can present added challenges and result in manual processes that ultimately slow development, especially when it comes to maintaining a strict backup and recovery strategy. MongoDB Atlas aims to ease this burden by providing the features needed to help organizations not only retain and protect their data for recovery purposes, but to meet compliance regulations with ease. Today we’re excited to announce the release of a new backup feature, Snapshot Distribution. Snapshot Distribution allows you to easily distribute your backup snapshots across multiple geographic regions within your primary cloud provider with the click of a button. You can configure how snapshots are distributed directly within your backup policy and Atlas will automatically distribute them to other regions as selected—no manual process necessary. How to distribute your snapshots To enable Snapshot Distribution, navigate to the backup policy for your cluster and select the toggle to copy snapshots to other regions. From there, you can add any number of regions within your primary cloud provider—including regions you are not deployed in—to store snapshot copies. You can even customize your configuration to copy only specific types of snapshots to certain regions. Copy snapshots to other regions Restore your cluster faster with optimized, intelligent restores If you need to restore your cluster, Atlas will intelligently decide whether to use the original snapshot or a copied snapshot for optimal restore speeds. Copied snapshots may be utilized in cases where you are restoring to a cluster in the same region as a snapshot copy, including multi-region clusters if the snapshots are copied to every cluster region. Alternatively, if the original snapshot becomes unavailable due to a regional outage within your cloud provider, Atlas will utilize a copy in the nearest region to enable restores regardless of the cloud region outage. Perform point in time restore Get started with Snapshot Distribution Although storing additional snapshot copies in varying places may not always be required, this can be extremely useful in several situations, such as: For organizations who have a compliance requirement to store backups in different geographical locations from their primary place of operation For organizations operating multi-region clusters looking for faster direct-attach restores for the entire cluster If you fall into either of these categories, Snapshot Distribution may be a valuable feature addition to your current backup policy, allowing you to automate prior manual processes and free up development time to focus on innovation. Check out the documentation to learn more or navigate to your backup policy to enable this feature. Enable Snapshot Distribution
Built With MongoDB: Satori Streamlines Secure Data Access
Handling data imposes contradictory responsibilities upon organizations. On one hand, they need to protect data from unauthorized access. On the other hand, they need to extract value from data; otherwise, why collect it in the first place? The contradiction lies in the fact that to extract value from data, you have to grant access to it, but unregulated access to data can lead to its misuse. Data access service provider Satori enables organizations to accelerate their data use by simplifying and automating access policies while helping to ensure compliance with data security and privacy requirements. In addition to being a member of the MongoDB for Startups program, Satori has just added support for MongoDB workloads, so organizations running MongoDB can now take advantage of Satori's secure data access service. Balancing act Despite the immense volume of sensitive personal, financial, or health-related data within most organizations, managing access to that data is often a manual process handled by a small team struggling with other competing interests. Satori chief scientist Ben Herzberg says this task of managing data access at companies is slowing down innovation. "The majority of organizations are still managing access to data in a manual way," Herzberg says. "Everyone is feeling the bottleneck. The data analyst who wants to do their job in a meaningful way just wants to understand what data sets they can use and get access to it fast." Getting access to data can be an uphill battle, however. "Sometimes you have to go through three or four different teams to get access to data," Herzberg says. "It can take a week or two." Meanwhile, the data engineers who are primarily responsible for managing access to data are getting pulled away from their core responsibilities. "This places the company in an uncomfortable position of having time-intensive processes implemented by teams who would prefer to be working on other tasks," Herzberg says. Simple, fast, secure As a data access service, Satori streamlines access to data, accelerates time-to-value, improves engineering productivity, and reduces complexity and operational risk, all while protecting sensitive data and maintaining compliance with relevant data privacy regulations. The first job of protecting sensitive data is identifying it, but according to Satori's research , few companies have a system in place that continuously monitors for and discovers sensitive data. Organizations that do monitor sensitive data typically do so only quarterly or annually. Herzberg says Satori continuously discovers sensitive data as it's being accessed. "As one of our customers said: I want to remain continuously compliant. I want to know where my sensitive data is at all times. We do that," Herzberg says. Data users can request access to data over Slack, the Satori data portal, and through other integrations to get immediate access to data without any engineering effort, changes to infrastructure, schemas, or tables, or creating objects on the database. "When a lot of people want access to data, you need a simple, fast, and secure way to do it without exposing yourself to risk," Herzberg says. Instead of taking days or weeks to process data access requests, with Satori, it takes just minutes. Build the next big thing with MongoDB Satori chose MongoDB early on because of the inherent flexibility of the document data model. "We chose MongoDB to move quickly and without limitations," Satori software engineering manager Oleg Toubenshlak says. "We didn't know what type of data we would be storing or how we might want to extend objects, so we chose MongoDB because of the flexibility of the data model." "MongoDB is a core component of our infrastructure where we keep customer configurations," Toubenshlak says. "We started with MongoDB deployed on-prem and moved to MongoDB Atlas." Toubenshlak cites continuous backups, easy deployment, and scalability as additional Atlas capabilities he finds valuable. "MongoDB allows us to move fast with development so we can focus on other areas. It's very simple in terms of security and network access. In terms of clients, MongoDB Atlas helps us provide extended capabilities in order to map our Java objects to BSON. It's very compatible and does this very quickly. Once we moved to Atlas, all our problems were solved," he says. Toubenshlak also appreciates the help he received as a member of the startup program. "We had startup credits, and we used professional services to make sure everything was configured properly," he says. "Satori is a small cluster for MongoDB, but I'm very surprised at the time investment we've received." The company is also excited about adding MongoDB Atlas to its list of supported platforms. "Adding MongoDB support is very exciting for us," Herzberg says. "We're already working with some design partners in different industries and helping them with their deployment. It's a meaningful step for us in NoSQL databases. We're seeing a lot of traction with existing customers that want to expand their MongoDB deployments and with new customers." If you're running MongoDB and are interested in simplifying data access, visit Satori and set up a demo or test drive. Are you part of a startup and interested in joining the MongoDB for Startups program? Apply now .
4 Reasons Why Your Tech Company Should Launch a Podcast
Podcasts, originally known as audioblogs, are a relatively new content format. The first podcast didn’t launch until some time around 2004, so it makes sense that many organizations have not, historically, considered podcasting to be a top priority. Now there are podcasts centered around almost any topic. From true crime to comedy, financial and pop culture, podcasts are quickly becoming one of the most popular mediums for learning and entertainment consumption, with 177 million listeners in 2022 . As the producer of the MongoDB podcast , I spend a majority of my time thinking about what folks in the database world want to know more about. I have had the privilege of meeting some incredible people in the tech community and have witnessed the impact a podcast can have. There are many reasons why your tech company should consider developing a podcast; let’s look at my top four. Your podcast audience already exist As a tech organization, you likely already know who you want to reach. Your audience is waiting for you to deliver more content, more learning and storytelling experiences. If you are aiming to reach developers or technical leaders and thinkers, podcasting is an ideal way to achieve this goal. LinkedIn research shows that tech professionals engage with content that helps their skill development, that is relevant to their industry, and they enjoy hearing from influencers. Podcasts meet all three of these preferences. Tech podcasts revolve around tech-based stories or news, are relevant to others in the field, and many podcast episodes include a guest speaker to inform and influence listeners. Another key driver of podcast success is its more relaxed and natural tone. Podcasts are conversational, and 8 out of 10 tech professionals say they interact more with quality information that is not “overloaded with jargon”. Podcasts help you reach your communities and increase reach easily and effectively. Your audience is out there waiting for your expert thoughts to hit their airwaves. Podcasts are flexible One perk of a podcast is in its convenience and its flexibility. Podcasts meet people where they are–literally, anywhere they are. Listeners have a lot of flexibility with podcasting. They can listen as they work, exercise, or commute. They can start, stop, pause, and continue at the touch of a button. Podcasts give you the ability to transform existing, well-performing content into a new format. People learn differently, and 30% of people are more auditory learners. Repurposing written content into a podcast format gives you the ability to reach new members of your audience and allows for expansion on the topic that may not already exist in the written format. Your organization is ripe with experts, partners, customers, stories, and content in other formats. Add sound to those ideas with a podcast. Conversely, recording a podcast on video provides both an audio-only and a video asset. Further, transcripts from the episode can be reworked into a blog or infographic on the same topic. And using the podcast recording as a subject-matter expert interview allows you to write additional content around the same topics of conversation within the episode. Moreover, listening to podcasts doesn’t feel like a chore or work. Podcasts blur the line between learning (in this case, about technology and your product or service) and entertainment, making listeners less resistant to your message. Podcasts let your community connect with industry leaders Ideally, you want your organization and its technical experts to be vocal, to be constantly sharing their opinions, thoughts, and discoveries. Podcasts are a great way to amplify your subject-matter expert voices and position your organization as a go-to place for learning and guidance. But it’s not just your own in-house experts that you can showcase; podcasts are also a platform to connect with other industry leaders and bring more diverse perspectives to the show. Podcasts can also help leaders who are more comfortable as speakers than writers; they can take part in the development of content easily and with little preparation. Your organization likely has a treasure trove of compelling stories and ideas, all living within the minds of your leaders. Hearing leaders and industry thinkers on your organization’s podcast helps to maximize a culture of excellence, inspiring others also to take part or suggest new topics or guests. Podcasting helps grow your community Podcast audiences are some of the most engaged audiences today. Research has found that 80% of listeners finish the entire episode each time and listen to an average of 7 shows per week . Podcasts have also been found to create more loyalty, making them 20% more likely to follow your organization on social media. This level of engagement leads to a community built around common interests and ideas, even to the point of mobilizing audiences. For example, Manoush Zomorodi , host of WNYC podcast Note to Self , encouraged her listeners to join a challenge to detach themselves from technology and focus on creative projects. More than 20,000 listeners engaged in the challenge . When people with common ground come together, they are more likely to engage, react, and even donate to keep that community alive. Marc Maron , host of the WTF podcast , says that 10% of his audience pays up to $8.99 monthly to support the podcast. Over the years, I’ve found that community engagement comes from responsiveness and interaction across several channels. I regularly engage with listeners to encourage feedback and respond to comments on LinkedIn, Twitter, Instagram, in our community forums, and even at live events.This sense of community deepens the appreciation I have—and that I hope my listeners have—in our jobs and the technology industry overall. Want to be a guest on The MongoDB Podcast? I will be live at AWS re:Invent 2022 in Las Vegas. Reach out to me if you have a great story idea and would like to take part in an in-person recording. Swing by the MongoDB booth, or, be sure to see me delivering the keynote demonstration on day one of the event! If you haven’t tuned into The MongoDB Podcast yet, you can subscribe on Apple Podcasts , Spotify , or wherever you find your podcasts.
Migrate Your Mindset to the Cloud Along with Your Data — A Conversation with Mark Porter and Accenture’s Michael Ljung
The challenges of getting data and applications into the cloud are well-known. Technology isn’t always the hardest part of cloud migration, however, and it won’t produce digital transformation on its own. In many cloud migrations, both people and processes need to change along with the technology. That’s because the processes that work in a legacy environment won’t necessarily help an organization thrive in the cloud. Instead, the opposite can happen: Legacy procedures tend to produce legacy results, making it difficult to achieve the impact that so many organizations seek from cloud-based digital transformation. As part of an ongoing series on cloud migration and digital transformation, MongoDB CTO Mark Porter sat down with Michael Ljung, Accenture’s Global Engineering Lead, to examine new approaches and new ways of thinking that can be crucial to success in the cloud. Experience and perspective During their conversation, both Porter and Ljung recounted situations during which they, and their organizations, were called upon to partner in a new way with clients that were struggling to migrate to the cloud. Each knew that their experience and perspective could lead to success for these organizations. They also knew that their message — that sometimes it’s necessary to go slow to go fast — might not find receptive ears. When organizations bring their old procedures and old deployment technologies to the cloud, “They’re in two places for a little while,” Porter says. Their data and applications may be in the cloud, but their mindset is on premises. Porter says that the MongoDB team helped a large cryptocurrency exchange through this exact situation. MongoDB helped the exchange get through the learning curve associated with new technologies, acting as an embedded member of the team and even guiding them in setting quarterly goals for their migrations. Ljung described a large government client that wanted to move to the cloud and do it quickly. The organization embraced agile methodologies but didn’t have the automation or the experience with CI/CD to support cloud development. They were releasing new code to production almost daily, but a fix in one place could easily cause a breaking change elsewhere, and often did. Digital done right The solution was to take a step back. Accenture started by supporting the organization in mastering incremental delivery. Next came some basic automation. With that in place, the organization was able to return to agile methodologies and organize themselves into sprints. Now, Ljung says, “This client is an example of digital transformation done right” — all because, as he and Porter agreed, they were willing to go slow to go fast. Watch the full video series with Mark Porter and Michael Ljung to learn more about the strategies that support successful cloud migration and digital transformation.
4 Ways to Create a Zero Trust Environment in Financial Services
For years, security professionals protected their IT much like medieval guards protected a walled city — they made it as difficult as possible to get inside. Once someone was past the perimeter, however, they had generous access to the riches within. In the financial sector, this would mean access to personal identifiable information (PII), including a “marketable data set” of credit card numbers, names, social security information, and more. Sadly, such breaches occurred in many cases, adversely affecting end users. A famous example is the Equifax incident, where a small breach led to years of unhappy customers. Since then, the security mindset has changed as users increasingly access networks and applications from any location, on any device, on platforms hosted in the cloud — the classic point-to-point security approach is obsolete. The perimeter has changed, so reliance on it as a protective barrier has changed as well. Given the huge amount of confidential client and customer data that the financial services industry deals with on a daily basis — and the strict regulations — security needs to be an even higher priority. The perceived value of this data also makes financial services organizations a primary target for data breaches. In this article, we’ll examine a different approach to security, called zero trust , that can better protect your assets. Paradigm shift Zero trust presents a new paradigm for cybersecurity. In a zero trust environment, the perimeter is assumed to have been breached; there are no trusted users, and no user or device gains trust simply because of its physical or network location. Every user, device, and connection must be continually verified and audited. Here are four concepts to know about creating a zero trust environment. 1. Securing the data Although ensuring access to banking apps and online services is vital, the database, which is the backend of these applications, is a key part of creating a zero trust environment. The database contains much of an organization’s sensitive, and regulated, information, along with data that may not be sensitive but is critical to keeping the organization running. Thus, it is imperative that a database be ready and able to work in a zero trust environment. As more databases are becoming cloud-based services, an important aspect is ensuring that the database is secure by default—meaning it is secure out of the box. This approach takes some of the responsibility for security out of the hands of administrators, because the highest levels of security are in place from the start, without requiring attention from users or administrators. To allow access, users and administrators must proactively make changes— nothing is automatically granted. As more financial institutions embrace the cloud, securing data can get more complicated. Security responsibilities are divided between the clients’ own organization, the cloud providers, and the vendors of the cloud services being used. This approach is known as the shared responsibility model. It moves away from the classic model where IT owns hardening of the servers and security and then needs to harden the software on top—for example, the version of the database software—and then harden the actual application code. In this model, the hardware (CPU, network, storage) are solely in the realm of the cloud provider that provisions these systems. The service provider for a Data-as-a-Service model then delivers the database hardened to the client with a designated endpoint. Only then does the actual client team and their application developers and DevOps team come into play for the actual solution. Security and resilience in the cloud are only possible when everyone is clear on their roles and responsibilities. Shared responsibility recognizes that cloud vendors ensure that their products are secure by default, while still available, but also that organizations take appropriate steps to continue to protect the data they keep in the cloud. 2. Authentication for customers and users In banks and finance organizations, there is a lot of focus on customer authentication, or making sure that accessing funds is as secure as possible. It’s also important, however, to ensure secure access to the database on the other end. An IT organization can use various methods to allow users to authenticate themselves to a database. Most often, the process includes a username and password. But, given the increased need to maintain the privacy of confidential customer information by financial services organizations, this step should only be viewed as a base layer. At the database layer, it is important to have transport layer security and SCRAM authentication , which enables traffic from clients to the database to be authenticated and encrypted in transit. Passwordless authentication should also be considered—not just for customers, but for internal teams as well. This can be done in multiple ways with the database, for example, auto-generated certificates may be required to access the database. Advanced options exist for organizations already using X.509 certificates that have a certificate management infrastructure. 3. Logging and auditing In the highly regulated financial industry, it is also important to monitor your zero trust environment to ensure that it remains in force and encompasses your database. The database should be able to log all actions or have functionality to apply filters to capture only specific events, users, or roles. Role-based auditing lets you log and report activities by specific roles, such as userAdmin or dbAdmin, coupled with any roles inherited by each user, rather than having to extract activity for each individual administrator. This approach makes it easier for organizations to enforce end-to-end operational control and maintain the insight necessary for compliance and reporting. 4. Encryption With large amounts of valuable data, financial institutions also need to make sure that they are embracing encryption —in flight, at rest, and even in use. Securing data with client-side, field-level encryption allows you to move to managed services in the cloud with greater confidence. The database only works with encrypted fields and organizations control their own encryption keys, rather than having the database provider manage them. This additional layer of security enforces an even more fine-grained separation of duties between those who use the database and those who administer and manage it. Also, as more data is being transmitted and stored in the cloud—some of which are highly sensitive workloads—additional technical options to control and limit access to confidential and regulated data is needed. However, this data still needs to be used. So, ensuring that in-use data encryption is part of your zero trust solution is vital. This approach enables organizations to confidently store sensitive data, meeting compliance requirements while also enabling different parts of the business to gain access and insights from it. Conclusion In a world where security of data is only becoming more important, financial services organizations rank among those with the most to lose if data gets into the wrong hands. Ditching the perimeter mentality and moving toward zero trust—especially as more cloud and as-a-service offerings are embedded in infrastructure—is the only way to truly protect such valuable assets. Learn more about developing a strategic advantage in financial services. Read the ebook now .
Women’s Advocacy Summit Recap: The Value of Inclusive Cultures
It’s July 26, 2022, and Sandhya Parameshwara, Managing Director, Accenture, opens the Women’s Advocacy Summit with a stark wake-up call: There are clear disconnects between business leaders’ perceptions of the importance of workplace culture and inclusivity and those of their employees and the wider public, especially millennials. Many leaders see culture as difficult to measure and link to business performance. Consequently, other issues often take a higher priority. Parameshwara, however, points to research that suggests that businesses with a strong focus on culture and equality also have staff, particularly women, who are more likely to reach senior positions and benefit from growth through innovation. Ahead of this curve are the people Parameshwara describes as “culture makers”—those who recognize the importance of an inclusive culture and reward those who strive to achieve it. “Culture makers are the people who say, who do, and then who drive,” she explains. “They are self-aware. They are relevant in the marketplace. They recognize and see the importance of the culture. They promote and advocate progress.” This notion set the tone for the rest of the Women’s Advocacy Summit, an event hosted in collaboration with the MongoDB Women’s Group, AT&T’s Women’s Group, and Women in Samsung Electronics. Two hundred women tech leaders and their allies came together to discuss the inequality that women continue to face in the workplace, how companies will forge ahead to accelerate their organizations’ equality, and how they’ll work to retain and cultivate their female talent. The power of courage Anne Chow, who recently retired as CEO at AT&T Business, is a clear example of a culture maker. Chatting with MongoDB CEO Dev Ittycheria, Chow discusses the value of positive change and shifting corporate dynamics. “There's no question that the future and our present require leaders to become truly inclusive,” she says. “It’s an evolving art and an evolving science.” Chow also believes there has been an evolution in corporate structures. “The power is flipped. It’s now in the hands of employees,” she explains. “One of the key things about being an inclusive leader is we need to meet people and align with where they want to be and where they want to go.” For Chow, positive change is “so desperately needed, across our businesses, across society, across the community,” and driving inclusivity requires a particular set of skills and attitudes. “Courage, especially moral courage, is one of the most foundational characteristics of great leadership,” Chow says. “You also need the realization that mistakes are simply part of the journey.” Ittycheria recalls an adage he gives his children: “Success is not the absence of problems, it's the ability to deal with them.” He adds, “Hope is not a strategy; you have to take a proactive approach. You have to find a way to navigate the difficult issues.” One of the difficult issues that women—especially if they’re parents—often struggle with is work-life balance, although this is a concept that Chow challenges. “One of my famous sayings is, ‘Balance is bogus.’ Why? You have one life that has personal characteristics and professional characteristics, and you are leading that one life.” Chow prefers to view life as an “optimization equation” in which you can have it all, just not necessarily at the same time. She also says that leaders must recognize that attitudes will vary. “What are you trying to optimize to? There is no answer that Dev or I or anybody could give you that's going to inform you what the right choice is for you.” Pay it forward A panel discussion brings a wider perspective as Asya Kamsky, a principal engineer at MongoDB, invites four women leaders to share their views. Key themes include the importance of support networks, juggling the responsibilities of work and parenting, and the obligation to mentor women as they build their careers. Having grown up in India and Africa, Anjali Nair, Microsoft’s VP of Azure Operators, is familiar with cultural biases in technology. And while things have changed in the past few decades, she still believes there is a long way to go before the balance of representation is fully redressed. “It's really about women uplifting and sponsoring each other,” she says. “I want to make sure I'm doing my part. I've been involved in grassroot initiatives where we get women involved in STEM at high schools and colleges. This is going to be a continuous process.” Success strategies for women have also evolved from simply being “more like the men,” says Leigh Nager, Vice President of mobile and networks commercial law at Samsung. “We're starting to understand that women bring characteristics to the table that are good for business,” she adds. “But how did we get that recognition? We had to get representation in the first place.” Many of these themes resonate with AT&T’s Vice President of eCommerce, Maryanne Cheung, who says that while being a woman in a largely male-led industry was once a “badge of honor” for her, the value of having a peer support group became critical, especially when she had concerns about starting a family. “I had a network I could reach out to and get advice from,” she recalls. “It’s important to recognize where we can show women more of our authentic selves at all stages of our lives. It's something I'm really passionate about.” Tara Hernandez, engineering VP at MongoDB, acknowledges support she has received, and that she in turn has her own duty and obligation to “pay that forward.” She also echoes Nager’s view that there is a strong commercial argument for fostering an inclusive culture. “It's not just about growing women in tech,” she concludes. “It's about recognizing that all of us bring something valuable that will lead to innovation, growth, and business success that are all ultimately in our best interests.” There’s still time to register for the next MongoDB Women’s Group event. Register to attend “Forging your Path as a Woman in Tech” on October 13 12:30pm - 1:30pm, 3:30pm - 4:30pm EDT. Interested in pursuing a career at MongoDB? We have several open roles on our teams across the globe, and we’d love for you to build your career with us.
Relational to NoSQL at Enterprise Scale: Lessons from Amazon
When most people think about Amazon, they think of the cloud. But the company was founded more than a decade before anyone was talking about the cloud. In fact, by 2002, when Amazon founder Jeff Bezos wrote a now-famous internal email directing all new software development to be designed around service-oriented architecture, Amazon was already a $5 billion enterprise. In 2017, Amazon was generating more than 50 times that annual revenue, and like many enterprise organizations, the core of that revenue was driven by the monolithic services that formed the backbone of the business. Those monoliths didn’t go away overnight, and in 2017 and 2018, Amazon kicked off a massive RDBMS-to-NoSQL migration project called “Rolling Stone” to move about 10,000 RDBMS-backed microservices as well as decompose the remaining monoliths into microservices backed by NoSQL. Amazon chose to use its own NoSQL database, but the lessons from that huge effort are valuable for any migration to a NoSQL or document database. In this article, I’ll share some of the insights gained about when and how to use NoSQL. RDBMS costs At the time of this migration, I ran the NoSQL Blackbelt Team for Amazon’s retail business, which was the center of excellence for the business and which developed most of the design patterns and best practices that Amazon uses to build NoSQL-backed application services today. In 2017, Amazon had more than 3,000 Oracle server instances, 10,000 application services and 25,000 global developers, and almost the entire development team was well versed in relational database technology. The cost of the IT infrastructure driving the business, however, was spiraling out of control. As the team started to look for root causes, they quickly realized that the cost of the relational database management system (RDBMS) was a big line item. The infrastructure required to support RDBMS workloads was enormous and did not scale well to meet the needs of the company’s high-demand services. Amazon had the biggest Oracle license and the largest RAC deployments in the world, and the cost and complexity of scaling services built on RDBMS was negatively affecting the business. As a result, we started looking at what we were actually doing in these databases. A couple of interesting things came out. We found that 70% of the access patterns that we were running against the data involved a single row of data on a single table. Another 20% were on a range of rows on a single table. So, we weren’t running complex queries against the data at high velocity. In fact, the vast majority were just inserts and updates, but many of those were executed “transactionally” across disparate systems using two-phase commits to ensure data consistency. Additionally, the cost was very high for the other 10% of the access patterns because most were complex queries requiring multiple table joins. Technology triggers While the team was looking into these issues, they also noticed a trend in the industry: Per core CPU performance was flattening, and the server processor industry was not investing enough in 5 nm fabrication technology to meet the efficiency increases described by Moore’s Law . This is one of the reasons why Amazon built its own processor . If you look at the history of data processing, you’ll see a series of peaks and valleys in what can be defined as “data pressure,” or the ability of a system to process the required amount of data at a reasonable cost and within a reasonable amount of time. When one of these dimensions is broken, it defines a “technology trigger” that signals the need to invent something. At Amazon, we saw that the cost efficiency of the relational database was declining while the TCO of high time-complexity queries was increasing as a result. Something had to change. Relational data platforms only scale well vertically, which means getting a bigger box. Sooner or later, there is no bigger box, and the options to scale an RDBMS-backed system introduce either design complexity or time complexity. Sharding RDBMS systems is hard to self-manage. And, although distributed SQL insulates users from that complexity by providing things like distributed cross commits behind the API to maintain consistency, that insulation also comes at a cost, which can be measured in the time complexity of the queries running across the distributed backend. At the same time, the cost of storage was falling and the promise of denormalized, low time-complexity queries in NoSQL was enticing to say the least. Clearly, it was never going to get any cheaper to operate a relational database; it was only going to get more expensive. Thus, Amazon made the decision to undertake what may be the largest technology migration ever attempted and depreciate RDBMS technology in favor of NoSQL for all Tier 1 services. A new approach to building NoSQL skills Project Rolling Stone launched with great fanfare and had buy-in from all the right stakeholders. But things didn’t go well at first. Amazon’s developers were now using a database designed to operate without the complex queries they had always relied on, and the lack of in-house NoSQL data modeling expertise was crippling the migration effort. The teams lacked the skills needed to design efficient data models, so the early results from prototyped solutions were far worse than anticipated. To correct this situation, leadership created a center of excellence to define best practices and educate the broad Amazon technical organization; the NoSQL Blackbelt Team was formed under my leadership. The challenge before us was enormous. We had limited resources with global scope across an organization of more than 25,000 technical team members. The traditional technical training approach built on workshops, brown bags and hackathons did not deliver the required results because the Amazon organization lacked a core nucleus of NoSQL skills to build on. Additionally, traditional training tends to be sandboxed around canned problems that are often not representative of what the developers are actually working on. As a result, technical team members were completing those exercises without significant insight into how to use NoSQL for their specific use cases. To correct this situation, we reworked the engagement model. Instead of running workshops and hackathons, we used the actual solutions the teams were working on as the learning exercises. The Blackbelt Team executed a series of focused engagements across Amazon development centers, where we delivered technical brown bag sessions to advocate best practices and design patterns. Instead of running canned workshops, however, we scheduled individual design reviews with teams to discuss their specific workloads and prototype a data model they could then iterate on. The result was powerful. Teams gained actionable information they could build on, rather than general knowledge that might or might not be relevant to their use case. During the next three years, Amazon migrated all Tier 1 RDBMS workloads to NoSQL and reduced the infrastructure required to support those services by more than 50%, while still maintaining a high business growth rate. Watch Rick Houlihan’s full MongoDB World 2022 presentation, “From RDBMS to NoSQL at Enterprise Scale.” When to use NoSQL - Looking at Access Patterns When should you use NoSQL? I had to answer this question many times at Amazon, and the answer isn’t so clear-cut. A relational database is agnostic to the access pattern. It doesn’t care what questions you ask. You don’t have to know code, although some people would argue that SQL is code. You can theoretically ask a simple question and get your data. Relational systems do that by being agnostic to every access pattern and by optimizing for none of them. The reality is that the code we write never asks random questions. When you write code, you’re doing it to automate a process — to run the same query a billion times a day, not to run a thousand random queries. Thus, if you understand the access patterns, you can start doing things with the data to create structures that are much easier for systems to retrieve while doing less work. This is the key. The only way to reduce the cost of data processing and the amount of infrastructure deployed is to do less work. OLTP (online transaction processing) applications are really the sweet spot for NoSQL databases. You’ll see the most cost efficiency here because you can create data models that mirror your access patterns and representative data structures that mirror your objects in the application layer. The idea is to deliver a system that is very fast at the high-velocity access patterns that make up the majority of your workload. I talk more about data access patterns and data modeling at a recent Ask Me Anything . Making It All Work There’s a saying that goes, “Data is like garbage. You better know what you are going to do with it before you collect it.” This is where relationships come into play. Nonrelational data, to me, does not exist. I’ve worked with more than a thousand customers and workloads, and I’ve never seen an example of nonrelational data. When I query data, relationships become defined by the conditions of my query. Every piece of data we’re working with has some sort of structure. It has schema, and it has relationships; otherwise, we wouldn’t care about it. No matter what application you’re building, you’re going to need some kind of entity relationship diagram (ERD) that describes your logical data diagram, entities and how they’re related to understand how to model it. Otherwise, you’re just throwing a bunch of bytes in a bucket and randomly selecting things. A relationship always exists between these things. In relational models, they’re typically modeled in third normal form (3NF). For example, in a typical product catalog, you’ll see one-to-one relationships between products and books, products and albums, products and videos, one-to-many relationships between albums and tracks, and many-to-many relationships between videos and actors. This is a pretty simple ERD — we’re not even talking about any complex patterns. But suppose you want to get a list of all your products, you’d have to run three different queries and various levels of joins. That’s a lot of things going on. In a NoSQL database, you’re going to take all those rows and collapse them into objects. If you think about the primary access pattern of this workload, it’s going to be something like, “Get me the product by this ID,” or “Get me all the titles under this category.” Whenever you want the product, you typically want all the data for the product because you’re going to use it in a product window. If you put it all in one document, you no longer have to join those documents or rows. You can just fetch the data by product ID. If you think about what’s happening from a time-complexity perspective, when you have all that data in tables, your one-to-one joins won’t be so bad, but with a one-to-many, the time complexity starts expanding. Again, the examples mentioned here are fairly simple. When you start getting into nested joins, outer and inner, and other more complex SQL statements, you can imagine how much worse the time complexity becomes. That’s your CPU burning away, assembling data across tables. If you’re running a relational data model and you’re joining tables, that’s a problem. Index and conquer Let’s think about how we model those joins in NoSQL. To start, we have a key-value lookup on a product. But we can also create an array of embedded documents called “target” that contains all the things the product is related to, as shown in Figure 1. It contains metadata and anything about the product you need when you query by product ID. Now that we’re using embedded documents, there’s no more time complexity. It’s all an index lookup. It can be one-to-one, one-to-many, many-to-many — it doesn’t matter. As long as the aim is “get the document,” it’s still an index lookup. Figure 1: Creating an array called “target” eliminates the need to join data from different rows, columns or tables. Of course, a lot more goes into an application than an index lookup. Remember, 70% of our access patterns at Amazon were for a single row of data, and 20% were for a range of rows on a single table. For more complex access patterns, we’re going to need more dimensions. If, for example, we’re running a query for all the books by a given author or all people related to “x,” this will require adding more dimensions, or documents, to the collection. We can create documents for other actors who were in a collection of movies, directors of all the movies, songs from the movies, how these songs relate to other entities in this collection, writers, producers, artists who performed the songs and all the albums those songs appeared on, as shown in Figure 2. Figure 2: Create more dimensions by adding documents to the collection. Now, if I index the “target” array — which is one of the best things about MongoDB and document databases, multikey arrays — I can create essentially a B-tree lookup structure of those “target” IDs and join all those documents and all of those dimensions, as shown in Figure 3. Now I can select, for example, where target ID is Mary Shelley and get everything she’s related to — the books, people, critiques of her work. Where the target ID is a song title, I can get all the information about that song. Figure 3: Multikey arrays create what is essentially a B-tree lookup structure, joining all related documents. Essentially, we’re using the index as a join mechanism, which is a critical distinction in NoSQL. At AWS, many teams came to me and told me that NoSQL doesn’t work. The key thing to understand, however, is if you index documents that are stored in the same table or collection on a common dimension that has the same value, you’ve essentially eliminated the need to join that same index across that same value and across multiple tables. That’s what the relational database does. You don’t want to join unindexed columns in a relational database because it will incur a lot of overhead. You want to index those attributes and put pointers to parent objects and child tables and then join on those IDs. With NoSQL, we’re essentially placing all those items in a single table and indexing on the ID. This approach also eliminates the time complexity. If all those documents share a common table, and they’re indexed on a common attribute, the time complexity is 0(log(N)) . Seventy percent of the overhead of handling a request from a database is not getting the data. It’s managing the connection, marshaling the data and moving it back and forth across the TCP/IP stack. So, if I can eliminate one request from a transaction, I’m going to reduce the overhead of that transaction. Conclusion Data that is accessed together should be stored together. That is the mantra that we’ve always espoused at MongoDB. Once we started learning how to use NoSQL at Amazon, we started having better results. We did that through regularly scheduled training sessions where we could teach the fundamentals of NoSQL using our own workloads. That’s what my developer advocacy team at MongoDB does now with customers. We provide templates for how to model data for their workloads to help them do it themselves.
MACH Aligned for Retail: Cloud-Native SaaS
MongoDB is an active member of the MACH Alliance , a non-profit cooperation of technology companies fostering the adoption of composable architecture principles promoting agility and innovation. Each letter in the MACH acronym corresponds to a different concept that should be leveraged when modernizing heritage solutions and creating brand-new experiences. MACH stands for Microservices, API-first, Cloud-native SaaS, and Headless. In previous articles in this series, we explored the importance of Microservices and the API-first approach. Here, we will focus on the third principle championed by the alliance: Cloud-native SaaS. Let’s dive in. What is cloud-native SaaS? Cloud-native SaaS solutions are vendor-managed applications developed in and for the cloud, and leveraging all the capabilities the cloud has to offer, such as fully managed hosting, built-in security, auto-scaling, cross-regional deployment, automatic updates, built-in analytics, and more. Why is cloud-native SaaS important for retail? Retailers are pressed to transform their digital offerings to meet rapidly shifting consumer needs and remain competitive. Traditionally, this means establishing areas of improvement for your systems and instructing your development teams to refactor components to introduce new capabilities (e.g., analytics engines for personalization or mobile app support) or to streamline architectures to make them easier to maintain (e.g., moving from monolith to microservices). These approaches can yield good results but require a substantial investment in time, budget, and internal technical knowledge to implement. Now, retailers have an alternative tool at their disposal: Cloud-native SaaS applications. These solutions are readily available off-the-shelf and require minimal configuration and development effort. Adopting them as part of your technology stack can accelerate the transformation and time to market of new features, while not requiring specific in-house technical expertise. Many cloud-native SaaS solutions focused on retail use cases are available (see Figure 1), including Vue Storefront , which provides a front-end presentation layer for ecommerce, and Amplience , which enables retailers to customize their digital experiences. Figure 1: Some MACH Alliance members providing retail solutions. At the same time, in-house development should not be totally discarded, and you should aim to strike the right balance between the two options based on your objectives. Figure 2 shows pros and cons of the two approaches: Figure 2: Pros and cons of cloud-native SaaS and in-house approaches. MongoDB is a great fit for cloud-native SaaS applications MongoDB’s product suite is cloud-native by design and is a great fit if your organization is adopting this principle, whether you prefer to run your database on-premises, leveraging MongoDB Community and Enterprise Advanced , or as SaaS with MongoDB Atlas . MongoDB Atlas, our developer data platform, is particularly suitable in this context. It supports the three major cloud providers (AWS, GCP, Azure) and leverages the cloud platforms’ features to achieve cloud-native principles and design: Auto-deployment & auto-healing: DB clusters are provisioned, set up, and healed automatically, reducing operational and DBA efforts. Automatically scalable: Built-in auto-scaling capabilities enable the database RAM, CPU, and storage to scale up or down depending on traffic and data volume. A MongoDB Serverless instance allows abstracting the infrastructure even further, by paying only for the resources you need. Globally distributed: The global nature of the retail industry requires data to be efficiently distributed to ensure high availability and compliance with data privacy regulations, such as GDPR , while implementing strict privacy controls. MongoDB Atlas leverages the flexibility of the cloud with its replica set architecture and multi-cloud support, meaning that data can be easily distributed to meet complex requirements Secure from the start: Network isolation, encryption, and granular auditing capabilities ensure data is only accessible to authorized individuals, thereby maintaining confidentiality. Always up to date: Security patches and minor upgrades are performed automatically with no intervention required from your team. Major releases can be integrated effortlessly, without modifying the underlying OS or working with package files. Monitorable and reliable: MongoDB Atlas distributes a set of utilities that provides real-time reporting of database activities to monitor and improve slow queries, visualize data traffic, and more. Backups are also fully managed, ensuring data integrity. Independent Software Vendors (ISVs) increasingly rely on capabilities like these to build cloud-native SaaS applications addressing retail use cases. For example, Commercetools offers a fully managed ecommerce platform underpinned by MongoDB Atlas (see Figure 3). Their end-to-end solution provides retailers with the tools to transform their ecommerce capabilities in a matter of days, instead of building a solution in-house. Commercetools is also a MACH Alliance member, fully embracing composable architecture paradigms explored in this series. Adopting Commercetools as your ecommerce platform of choice lets you automatically scale your ecommerce as traffic increases, and it integrates with many third-party systems, ranging from payment platforms to front-end solutions. Additionally, its headless nature and strong API layer allow your front-end to be adapted based on your brands, currencies, and geographies. Commercetools runs on and natively ingests data from MongoDB. Leveraging MongoDB for your other home-grown applications means that you can standardize your data estate, while taking advantage of the many capabilities that the MongoDB data platform has to offer. The same principles can be applied to other SaaS solutions running on MongoDB. Figure 3: MongoDB Atlas and Commercetools capabilities. Find out more about the MongoDB partnership with Commercetools . Learn how Commercetools enabled Audi to integrate its in-car commerce solution and adapt it to 26 countries . MongoDB supports your home-grown applications MongoDB offers a powerful developer data platform, providing the tools to leverage composable architecture patterns and build differentiating experiences in-house. The same benefits of MongoDB’s cloud-native architecture explored earlier are also applicable in this context and are leveraged by many retailers globally, such as Conrad Electronics, running their B2B ecommerce platform on MongoDB Atlas . Summary Cloud-native principles are an essential component of modern systems and applications. They support ISVs in developing powerful SaaS applications and can be leveraged to build proprietary systems in-house. In both scenarios, MongoDB is strongly positioned to deliver on the cloud-native capabilities that should be expected from a modern data platform. Stay tuned for our final blog of this series on Headless and check out our previous blogs on Microservices and API-first .
How a Data Mesh Facilitates Open Banking
Open banking shows signs of revolutionizing the financial world. In response to pressure from regulators, consumers, or both, banks around the world continue to adopt the central tenet of open banking: Make it easy for consumers to share their financial data with third-party service providers and allow those third parties to initiate transactions. To meet this challenge, banks need to transition from sole owners of financial data and the customer relationship to partners in a new, distributed network of services. Instead of competing with other established banks, they now compete with fintech startups and other non-bank entities for consumer attention and the supply of key services. Despite fundamental shifts in both the competition and the customer relationship, however, open banking offers a huge commercial opportunity, which we’ll look at more closely in this article. After all, banks still hold the most important currency in this changing landscape: trust. Balancing data protection with data sharing Established banks hold a special position in the financial system. Because they are long-standing, heavily regulated, and backed by government agencies that guarantee deposits (e.g., the FDIC in the United States), established banks are trusted by consumers over fintech startups when it comes to making their first forays into open banking. A study by Mastercard of 4,000 U.S. and Canadian consumers found that the majority (55% and 53%, respectively) strongly trusted banks with their financial data. Only 32% of U.S. respondents and 19% of Canadians felt the same way about fintech startups. This position of trust extends to the defensive and risk-averse stance of established banks when it comes to sharing customer data. Even when sharing data internally, these banks have strict, permission-based data access controls and risk-management practices. They also maintain extensive digital audit trails. Open banking challenges these traditional data access practices, however, causing banks to move to a model where end customers are empowered to share their sensitive financial data with a growing number of third parties. Some open banking standards, such as Europe’s Payment Services Directive (PSD2), specifically promote informed consent data sharing, further underlining the shift to consumers as the ultimate stewards of their data. At the same time, banks must comply with evolving global privacy laws, such as Europe’s General Data Protection Regulation (GDPR). These laws add another layer of risk and complexity to data sharing, granting consumers (or “data subjects” in GDPR terms) the right to explicit consent before data is shared, the right to withdraw that consent, data portability rights, and the right to erasure of that data — the famed “right to be forgotten.” In summary, banks are under pressure from regulators and consumers to make data more available, and customers now make the final decision about which third parties will receive that data. Banks are also responsible for managing: Different levels of consent for different types of data The ability to redact certain sensitive fields in a data file, while still sharing the file Compliance with data privacy laws, including "the right to be forgotten" The open opportunity for banks In spite of the competition and added risks for established banks, open banking greatly expands the global market of customers, opens up new business models and services, and creates new ways to grow customer relationships. In an open banking environment, banks can leverage best-of-breed services from third parties to bolster their core banking services and augment their online and mobile banking experiences. Established banks can also create their own branded or “white label” services, like payment platforms, and offer them as services for others to use within the open banking ecosystem. For customers, the ability of third parties to get access to a true 360-degree view of their banking and payment relationships creates new insights that banks would not have been able to generate with just their own data. Given the risks, and the huge potential rewards, how do banks satisfy the push and pull of data sharing and data protection? How do they systematically collect, organize, and publish the most relevant data from across the organization for third parties to consume? Banks need a flexible data architecture that enables the deliberate collection and sharing of customer data both internally and externally, coupled with fine-grained access, traceability, and data privacy controls down to the individual field level. At the same time, this new approach must also provide a speed of development and flexibility that limits the cost of compliance with these new regulations and evolving open banking standards. Rise of the data mesh Open banking requires a fundamental change in a bank’s data infrastructure and its relationship with data. The technology underlying the relational databases and mainframes in use at many established banks was first developed in the 1970s. Conceived long before the cloud computing era, these technologies were never intended to support the demands of open banking, nor the volume, variety, and velocity of data that banks must deal with today. Banks are overcoming these limitations and embracing open banking by remodeling their approach to data and by building a data mesh using a modern developer data platform. What is a data mesh? A data mesh is an architectural framework that helps banks decentralize their approach to sharing and governing data, while also enabling self-service consumption of that data. It achieves this by grouping a bank’s data into domains. Each domain in a data mesh contains related data from across the bank. For example, a "consumer" domain may contain data about accounts, addresses, and relationship managers from across every department of the bank. Each data domain is owned by a different internal stakeholder group or department within the bank, and these owners are responsible for collecting, cleansing, and distributing the data in their domain across the enterprise and to consumers. With open banking, domain owners are also responsible for sharing data to third parties. This decentralized, end-to-end approach to data ownership encourages departments within the bank to adopt a “product-like” mentality toward the data within their domain, ensuring that it is maintained and made available like any other service or product they deliver. For this reason, the term data-as-a-product is synonymous with data mesh. Data domain owners are also expected to: Create and maintain relevant reshaped copies of data, rather than pursue a single-source-of-truth or canonical model. Serve data by exposing data product APIs. This means doing the cleansing and curation of data as close as possible to the source, rather than moving data through complex data pipelines to multiple environments. The successful implementation of a data mesh, and the adoption of a data-as-a-product culture, requires a fundamental understanding of localized data. It also requires proper documentation, design, management, and, most important, flexibility, as in the ability to extend the internal data model. The flexibility of the document model is, therefore, critical for success. Conclusion Open banking holds great potential for the future of the customer experience, and will help established financial institutions meet the ever-evolving customer expectations. Facilitated by a data mesh, you can open new doors for responsible, efficient data sharing across your financial institution, and this increase in data transparency leads to better outcomes for your customers—and your bottom line. Want to learn more about the benefits of open banking? Watch the panel discussion Open Banking: Future-Proof Your Bank in a World of Changing Data and API Standards .
What’s New in Atlas Charts: Streamlined Data Sources
We’re excited to announce a major improvement to managing data sources in MongoDB Atlas Charts : Atlas data is now available for visualization automatically, with zero setup required. Every visualization relies on an underlying data source. In the past, Charts made adding Atlas data as a source fairly straightforward, but teams still needed to manually choose clusters and collections from which to power their dashboards. Streamlined data sources , however, eliminates the manual steps required to add data sources into Charts. This feature further optimizes your data visualization workflow by automatically making clusters, serverless instances, and federated database instances in your project available as data sources within Charts. For example, if you start up a new cluster or collection and want to create a visual quickly, you can simply go into one of your dashboards and start building a chart immediately. Check out streamlined data sources in action: See how the new data sources experience streamlines your data visualization workflow in Charts. Maintain full control of your data Although all project data will be available automatically to project members by default, we know how important it is to be able to control what data can be used by your team. For example, you may have sensitive customer data or company financials in a cluster. Project owners maintain full control over limiting access to data like this when needed. As shown in the following image, with a few clicks, you can select any cluster or collection, confirm whether or not any charts are using a data source, and disconnect when ready. If you have collections that you want some of your team to access but not others, this can be easily achieved under Data Access in collection settings as seen in the following image. With every release, our goal is to make visualizing Atlas data more frictionless and powerful. The Streamlined data sources feature helps us take a big step in this direction. Building data visualizations just got even easier with Atlas Charts. Give it a try today ! New to Atlas Charts? Get started today by logging into or signing up for MongoDB Atlas , deploying or selecting a cluster, and activating Charts for free.
How to Leverage Enriched Queries with MongoDB 6.0
MongoDB introduces useful new functions and features with every release, and MongoDB 6.0, released this summer, offers many notable improvements , including deeper insights from enriched queries via the MongoDB Query API . This set of query enhancements was announced at MongoDB World 2022 by senior product manager Katya Kamenieva. You can watch her presentation below. Watch Kayta Kamenieva’s MongoDB World presentation on queries. Users can now use upgraded operators and change stream features. In this post, we’ll look at several of these updates, along with examples of how you can put them to use. Top N accumulators With this new feature, users can compute top items in each group based on the sort criteria ( $topN , $bottomN ), current order of documents ($firstN, $lastN), or value of a field ( $manX , $minN ). This functionality would be useful, for example, if you have a collection of restaurants with ratings, and you want to see the top three highest-rated restaurants based on the type of cuisine. You can group by cuisine and use $topN to return the top three restaurants by rating. Ability to sort arrays The ability to sort an array allows users to sort elements in the array. For example, suppose you have posted content with hundreds of user comments, and you want to sort the comments based on how many likes they received. In this case, $sortArray can pull those comments and prioritize them to the top of the comments list. Densification and gap-filling These new additions to the aggregation framework help to build out time series data more completely. When attempting to create histograms of data over time, the new stages, $denisfy and $fill , allow you to fill gaps in that data to create smoother and more complete graphs using linear interpolation, last/next observed value carried forward, or a constant value. This capability can be helpful, for example, if you want to create a graph that shows the amount of inventory in a warehouse every day for a year, but the inventory was only recorded once a week. The $densify expression will fill the gaps in the timeline, while $fill will produce values for the inventory data based on the previous observation. Joining sharded collections With this new feature, when joining collections using $lookup or performing recursive search with $graphLookup , collections on both sides can be sharded. Before 6.0, only the originating collection could be sharded. An example use case is enriching records in the “accounts” collection with the list of the corresponding orders that are stored in the “orders” collection. In the past, only “accounts” collections could be sharded. Starting with 6.0, both “accounts” and “orders” collections can be sharded. Change streams pre- and post-images Change streams now offer point-in-time (PIT) pre- and post-image capabilities , allowing users to include the state of the document before and after changes in the output of the change stream. This functionality can be useful in many situations. For example, suppose a company is tracking flight times. If a flight is delayed, the system can compare the value of the departure and arrival times both before and after that delay and trigger an automatic rewrite of the schedule for the new flight timeline, including schedules for the entire crew. Atlas Search across multiple collections This improvement to MongoDB Atlas Search allows users to search across multiple collections with a single query using $search inside the $unionWith or $lookup stages. $search can provide these results quickly, using only one query. Enriched queries are not the only improvements in MongoDB 6.0. Read about the 7 reasons to upgrade to MongoDB 6.0 and discover the possibilities. Try MongoDB Atlas for Free Today
MongoDB Connector for Apache Kafka 1.8 Available Now
MongoDB has released version 1.8 of the MongoDB Connector for Apache Kafka with new monitoring and debugging capabilities. In this article, we’ll highlight key features of this release. JMX monitoring The MongoDB Connector works with Apache Kafka Connect to provide a way for users to easily move data between MongoDB and the Apache Kafka. The MongoDB connector is written in Java and now implements Java Management Extensions (JMX) interfaces that allow you to access metrics reporting. These metrics will make troubleshooting and performance tuning easier. JMX technology, which is part of the Java platform, provides a simple, standard way for applications to provide metrics reporting with many third-party tools available to consume and present the data. For those who might not be familiar with JMX monitoring , let’s look at a few key concepts. An MBean is a managed Java object, which represents a particular component that is being measured or controlled. Each component can have one or more MBean attributes. The MongoDB Connector for Apache Kafka publishes MBeans under the “com.mongodb.kafka.connector” domain. Many open source tools are available to monitor JMX metrics, such as the console-based JmxTerm or the more feature-complete monitoring and alerting tools like Prometheus . JConsole is also available as part of the Java Development Kit (JDK). Note: Regardless of your client tool, MBeans for the connector are only available when there are active source or sink configurations defined on the connector. Visualizing metrics Figure 1: Source task JMX metrics from JConsole. Figure 1 shows some of the metrics exposed by the source connector using JConsole. In this example, a sink task was created and by default is called “sink-task-0”. The applicable metrics are shown in the JConsole MBeans panel. A complete list of both source and sink metrics will be available in the MongoDB Kafka Connector online documentation shortly after the release of 1.8. MongoDB Atlas is a great platform to store, analyze, and visualize monitoring metrics produced by JMX. If you’d like to try visualizing JMX metrics in MongoDB Atlas generated by the connector, check out jmx2mongo . This tool continuously writes JMX metrics to a MongoDB time series collection. Once the data is in MongoDB Atlas, you can easily create charts from the data like the following: Figure 2: MongoDB Atlas Chart showing successful batch writes vs writes greater than 100ms. Figure 2 shows the number of successful batch writes performed by a MongoDB sink task and the number of those batch writes that took longer than 100ms to execute. There are many other monitoring use cases available; check out the latest MongoDB Kafka Connector documentation for more information. Extended debugging Over the years, the connector team collected requests from users to enhance error messages or provide additional debug information for troubleshooting. In 1.8, you will notice additional log messages and more descriptive errors. For example, before 1.8, if you set the copy.existing parameter, you may get the log message: “Shutting down executors.” This message is not clear. To address this lack of clarity, the message now reads: “Finished copying existing data from the collection(s).” These debugging improvements in combination with the new JMX metrics will make it easier for you to gain insight into the connector and help troubleshoot issues you may encounter. If you have ideas for additional metrics or scenarios where additional debugging messages would be helpful, please let us know by filing a JIRA ticket . For more information on the latest release, check out the MongoDB Kafka Connector documentation . To download the connector, go to the MongoDB Connector repository in GitHub or download from the Confluent Hub .