MongoDB Blog

Articles, announcements, news, updates and more

MDBWomen: A Look into MongoDB’s Affinity Group for Women-Identifying Employees

MongoDB affinity groups are employee-led resource groups that bring together employees with similar backgrounds, interests, or goals. They play an important role in our company and culture. Our affinity groups build community and connections, help us raise awareness of issues unique to their members’ experiences, and offer networking and professional development opportunities. I sat down with some of the leaders of MDBWomen to learn more about their initiatives, impact, and plans for the future. What is MDBWomen? MDBWomen is a community of MongoDB employees identifying as women. We acknowledge that working women face many challenges and that not everyone experiences them in the same way. Our purpose is to connect and amplify the voices of working women at MongoDB by providing a space for support and advocacy. We understand that both work and nonwork conversations are important and use our time together to share experiences and build connections. We are women from all walks of life who want to create a safe space for discussing important topics. How did MDBWomen get started, and how has the group grown? MDBWomen began as a cohort of women within our North American recruiting organization. Although it was informal, it quickly became a recognized affinity group, but there was no group page within our intranet, no mission statement, and no globally friendly meetings outside of U.S. time zones. After a few years, an opportunity arose to reimagine the group, work on a mission statement, and expand from being just a social club to having a strategic plan for supporting women and impacting the business. Since its inception, MDBWomen has grown to just shy of 500 members globally, with chapters in India, Australia, and Ireland in addition to U.S. chapters in Palo Alto, California; Austin , Texas; and New York City. Wherever women are, MDBWomen helps activate them! What types of initiatives does MDBWomen organize? Our biggest initiatives typically take place during Women’s History Month. Every International Women’s Day (March 8), we host a companywide Purple Shirt Day to show support for women’s rights and raise awareness about the challenges working women still face around the world. In previous years, we’ve brought in spotlight speakers from outside the organization to discuss their personal experiences with being a woman leader in the tech industry. This year, MDBWomen organized a handful of events for Women’s History Month, including professional development workshops, panel events featuring speakers in sales and engineering, an empowering yoga flow and meditation, a Bollywood dance class, and a Kudoboard to share tips, words of wisdom, or experiences about promoting equality for women and employees who identify as LGBTQIA+. We are also aware of the particular challenges working mothers face. In an effort to destigmatize pregnancy and motherhood at work, we’ve partnered with one of our benefits providers, Carrot , to host sessions that discuss pathways to parenthood and fertility. It can be difficult to coordinate global events that all of our members are able to participate in, and we recognize that women face different challenges in different regions and cultures. Although many of our MDBWomen events are global, we also rely on the chapter leaders to coordinate initiatives in their region. Many chapters hold casual meetups along with networking events and other workshops throughout the year that allow women-identifying employees to connect with one another, find mentors, and upskill. 2019 International Women's Day Celebration in NYC How has participating in MDBWomen impacted some of our employees? We’ve had a lot of impactful follow up conversations after MDBWomen events. Our CIO Lena Smart gave a talk about imposter syndrome last year, and we had a great discussion afterwards. Knowing that you’re not alone, your voice is heard, and your feelings are valid is a big part of the support we give to our members. Our Carrot fertility sessions have allowed women to speak about things they normally wouldn’t talk about in a traditional work setting, and we were able to hear stories from women who had similar struggles and provide them with resources. It’s not just the events and speakers that have made an impact, but our individual members as well. Many of our members have found mentors within the group or connected with other women who have gone through similar experiences, and we love that we’re able to introduce women to one another across the company and across the globe. So many women have told their chapter leaders that they wouldn’t have received such a high level of support if it weren’t for MDBWomen. Read Jane Zirinsky’s story below to learn more about how MDBWomen has impacted her. Jane Zirinsky: In her Words One of the challenges many women face when planning their careers is building out space to also plan for a family. As soon as I hit my mid-twenties, I couldn’t help but notice all the studies, articles, and thought pieces on the so-called motherhood penalty that can affect women as they attempt to progress in their careers. I knew that I would have to be proactive in my career planning to avoid the dreaded plateau motherhood can unfortunately result in. However, one thing I didn’t know I needed to plan for was how to communicate to my boss and colleagues when I had a miscarriage. There really is no Emily Post guide for that! When I lost my pregnancy in the summer of 2020, I knew I couldn't hide it and that I would need support and understanding. However, I didn't know how to share this news with the people I worked with. Embarrassingly, my biggest concern was that I would make them uncomfortable. I felt vulnerable. Thankfully, my manager and I have built a strong relationship founded on trust and respect. She’s also a woman, and a friend, which made telling her much easier. My manager asked if I felt comfortable speaking to HR so that I could get access to the benefits available to me. Through our vendor, Cleo, U.S. employees can access grief counseling, support groups, and bereavement leave. I had no idea that this was an option for me and gratefully took advantage of the program. When they think of fertility benefits, many people think about hospital payments, parental leave, and childcare. It is so easy to forget that 25 percent of all pregnancies end in a miscarriage, and not a baby. I am a very outgoing, cheerful person, and there was a noticeable change in my energy levels after my miscarriage. I needed some time off to mourn, cry, breathe, heal, and process the complexities of all the emotions that come with losing a pregnancy. I learned that your body doesn’t care when you lose a pregnancy. It hits you with the full flood of postpartum hormones, which for many women (lucky me!) also includes the added onslaught of postpartum depression. I knew that these feelings were inevitable, and that people around me would notice something was off. Asking for help and being vulnerable is easier said than done. I always advise women and friends to reach out to their communities when they need support; so I did what I tell women to do all the time: I reached out to my community. I posted in our private, internal MDBWomen Slack channel about what I had gone through. Although it was challenging to be so vulnerable, it was the single best thing I could have done. I received an outpouring of support from MongoDB women across the world. They shared with me privately that I was not alone. I had more than twelve 1:1 conversations with other women who had lost a pregnancy. Some wanted to thank me for being brave and sharing my experience, some wanted to connect and cry, and some just wanted me to know them and to better know me. The single strongest tool I had to fight my depression was a feeling of connectedness and community. No matter how strong you are, nothing makes you feel more alone than depression. Add in the isolation of the COVID-19 pandemic, and that took my depression to another level. Had I not been brave, I would have missed the chance to connect with and support other women too. Now, I strive to be a resource for other women at MongoDB, whether it’s sharing information about access to benefits or proofreading emails that will alert leaders of the need for time off or additional support. I’m grateful that MDBWomen is a safe place to be open, share experiences, and receive the support and empowerment that every woman deserves. Hear from Some of Our Chapters North America Led by Jane Zirinsky , Melanie Kyono , Megan Blancato , Alexandra Hills , Gigi Neuenfeldt , and Libby Firer . The North America chapters have members in Palo Alto, Austin, New York, and many other remote locations across the U.S. and Canada. We’ve had women jump in and get involved during their first week at MongoDB alongside women who have been here for years. We believe strongly that empowered women empower women, and that you get what you give in communities like ours. Building a strong internal network provides support when facing challenges and gaining access to new opportunities. As part of this network, we’ve created an internal Propel-Her group aimed at elevating MongoDB women through mentorship and shared experiences. Propel-Her at MongoDB will be launching small, goal-driven peer mentor groups focused on specific professional development goals such as internal branding, negotiation, self-advocacy, and networking, where the emphasis is on peer mentoring and skill sharing. We are also launching a speaker pipeline in concert with our women in sales groups, which helps to connect our membership with women leaders in other companies and industries to inspire and teach us. With NYC and Palo Alto tech hubs being in our backyards, we strive to connect our members to the wider world of women in tech. Because MongoDB is headquartered in New York City, we have the advantage of access to the majority of our executive leadership team. One of our main goals has been to leverage that access to expand the connection our global members have with our C-suite. We do this via Q&A sessions with our executive team, sessions that spotlight women leaders and experts in their fields, and partnerships with our Recruiting and Diversity and Inclusion teams to ensure we can advocate for our members where the impact is greatest. Australia Led by Tammy Bailey and Jocelyn del Prado The Australian chapter of MDBWomen started just over a year ago, right before the COVID-19 pandemic. The women in Australia typically cannot participate in global MDBWomen events and meetings due to the time zone disparity, so we wanted to create a local community of women who could support one another. We brainstormed heaps of ideas and scheduled our kickoff event for International Women’s Day 2020, but the pandemic brought most of that to a halt. Despite this, we organized regular Zoom meetings that allowed us to connect, meet new hires, and generally get to know each other. We had a great lineup of events for Women’s History Month in 2021, and we plan to continue this momentum throughout the year. One of our goals moving forward is to engage women across various departments and roles within MongoDB. We plan to hold even more organized activities such as event sponsorships, welcoming and mentorship programs, ladies’ lunches, high teas, informal meetups, and yoga sessions. Another goal is to create opportunities for collaboration and friendships with women in other locations. The number of women employees in Australia has doubled over the past year, and we’re always working on ways to bring more extraordinary women into the organization. MDBWomen Australia is a place to have your voice heard and make a difference, and we are excited to continue growing our group of amazing women in Australia! MDBWomen Australia celebrating Purple Shirt Dat virtually in 2021 India Led by Palki Sood and Neha Mukherjee We joined MongoDB one month apart from each other and reached out separately to our office site leader, Amit Babbar, with our ideas and vision of forming an employee affinity group specifically for women in India. He connected the two of us with each other in August 2019, and the rest is history! India became the first established chapter of MDBWomen outside of North America. Our vision was to build a network of trust and a strong support system for all employees who identify as women in India . We believe that empowered women empower women. To add a local touch, we came up with the moniker “MongoWomaniya,” which is a fun way of representing our group and resonates with each member. We are proud that the logo we created for our group is now used as the logo for the global women’s group. We’ve been able to help foster new friendships by providing group members with a platform to get to know each other better and be sounding boards for common issues. We even started our own recognition program called “MongoDB India Superwoman of the Quarter,” which highlights women employees who are not only star performers but are also succeeding in balancing their work-life responsibilities and leading the way with their impact. Since the pandemic began, we have held multiple virtual engagement sessions addressing “taboo” topics such as polycystic ovary syndrome (PCOS). We also have held self-care sessions and collaborated with other affinity groups for activities such as Bollywood dancing. We have future plans to host more inspirational speakers, engage more “Womaniyas” to lead our regular meetings, and collaborate with recruiting to ensure we drive our diversity hiring goals. Our main goal is to ensure MongoDB India is a top employer for women, driven by our inclusive and equitable culture. MDBWomen India, AKA Womaniya, gather in the office prior to COVID-19 Ireland Led by Rita Martins Rodrigues , Avril Murphy , and Amy McKeon The Dublin chapter of MDBWomen provides a safe space for those identifying as women and allies to come together, share experiences, and help each other grow. Our goal is to support the women of our Dublin chapter with mentorship and upskilling programs, along with engaging our allies in open conversations in which we can help them demystify allyship and how it shows up at work. There is also an opportunity for the women of our chapter to connect with their peers in all of our major locations. We held our first event in April and are looking forward to establishing a community for the women of our Dublin team! Interested in pursuing a career at MongoDB and joining MDBWomen? We have several open roles on our teams across the globe and would love for you to transform your career with us!

May 10, 2021

The Foundations of IT Modernization

Lately, I’ve been thinking a lot about the term “modernization.” It’s a broad term that means different things to different people. To some, modernization means migrating legacy systems to the cloud. To others, it means rewriting applications, containerizing, or embracing microservices architecture. And to others still, modernization is synonymous with an equally amorphous (and ubiquitous) term: digital transformation. However you define it, modernization is all the rage right now. IDC says these investments are growing at a compound annual growth rate of 15.5%, and will reach $6.8 trillion by 2023 . (yeah, trillion, with a ‘t’). This frenzy of spending on technology, services, and skills is intended to bring aging systems and business processes up to date. In many cases, these investments are urgent and necessary, as companies of all shapes and sizes, in every industry, must accelerate their pace of innovation in order to survive. But the work of modernization is complex, costly, and technically challenging. It’s like renovating every room in a sprawling estate, while you’re still living in it. It’s hard to even know where to start. To that end, I can’t help but think about the words of the CIO of a $30 billion insurance company who had already been on a modernization journey for years. He said: “We tried everything to accelerate innovation...but, in the end, it was our data platform that was holding us back.” In other words, they were spending millions to fix up their estate, adding radiant heat, smart speakers, and a state-of-the-art home theater. But they were building on top of foundations that were first poured when disco was new. (I’m looking at you, relational databases, first conceived and implemented in the 1970s). In the digital economy, companies succeed or fail based on how fast they innovate. More often than not, that innovation takes the form of software and services, which in turn create value by storing, manipulating, and querying data. And what do you use to store, manipulate, and query all that data? Your application data platform. Years ago, that just meant ‘database with some scripts around it.’ Those days are gone. Now, an application data platform has to supply speed, governance, security, availability, and more. So let’s get back my modernization metaphor. You can’t build new solid things on top of creaky, unstable, old things. We all know the old things I’m talking about; databases that make you structure your data in a way that isn’t natural, languages written to be so precise to the computer that they are inscrutable to developers, ‘roach motel’ storage systems that don’t store things in modern, open formats. So if you want to modernize your infrastructure, or modernize your applications, or modernize the way you build software, shouldn’t you first modernize your data platform? Sure, it’s hard to renovate your house. But this is where you live. And if you want that house to last, make sure it’s built on solid foundations. What does that mean? It means that it’s not enough to design and build the right apps. If you want to be truly modern, look at how you input and output your data, how you query, manipulate, and store it, and how you program against it. Get those things right, and you dramatically increase your pace of innovation. No matter where you are in your own modernization journey, it’s not too late to do this. Don’t believe it? Hit me back on Twitter at @MarkLovesTech and I’ll show you how.

May 10, 2021
Mark Loves Tech

MongoDB Query API Webinar: FAQ

Last week we held a live webinar on the MongoDB Query API and our lineup of idiomatic programming language drivers. There were many great questions during the session, and in this post, what I want to do is share the most frequently asked ones with you. But first - here is a quick summary of what MongoDB Query API is all about if you are unfamiliar with it. What is MongoDB Query API? MongoDB is built upon the document data model . The document model is designed to be intuitive, flexible, universal, and powerful. You can easily work with a variety of data, and because documents map directly to the objects in your code, it fits naturally in your app development experience. MongoDB Query API lets you work with data as code and build any class of application faster by giving you extensive query capabilities natively in any modern programming language. Whether you’re working with transactional data, looking for search capabilities, or trying to run sophisticated real-time analytics, MongoDB Query API can meet your needs. MongoDB Query API has some unique features like its expressive query, primary and secondary indexes, powerful aggregations and transformations, on-demand materialized views, and more — enabling you to work with data of any structure, at any scale. Some key features to highlight: Indexes To optimize any workload and query pattern you can take advantage of a large set of index types like multi-key (for arrays), wildcard, geospatial, and more and index any field no matter how deeply nested it is within your documents. Fully featured secondary indexes are document-optimized and include partial, unique, case insensitive, and sparse. Aggregation Pipeline Aggregation pipeline lets you group, transform, and analyze your data to support any class of workload. You can choose from dozens of aggregation stages and over 200 operators to build modular and expressive pipelines. You can also use low-code tools like MongoDB Compass to drag and drop stages, examine intermediate output, and export to your programming language of choice. On-Demand Materialized Views The powerful $merge aggregation stage allows you to combine the results of your aggregation pipeline with existing collections to update and enrich data without having to recompute your entire data set. You can output results to sharded and unsharded collections while simultaneously defining indexes on each view Geospatial and Graph Utilize MongoDB’s built-in natively ability to store and run queries against geospatial data Use operators like $graphLookup to quickly traverse connected data sets These are just a few of the features we highlighted in the MongoDB Query API webinar. No matter what type of application you are thinking of building or managing, MongoDB Query API can meet your needs as the needs of your users and application change. FAQs for MongoDB Query API Here are the most common questions asked during the webinar: Do we have access to the data sets presented in this webinar? Yes, you can easily create a cluster and load the sample data sets into Atlas. Instructions on how to get started are here . How can I access full-text search capabilities? Text search is a standard feature of MongoDB Atlas. You can go to to try it out using sample data sets. Does VS code plugin support Aggregation? Yes, it does. You can learn more about the VS code plugin on our docs page. If you need to pass variable values in the aggregation, say the price range from the app as an input, how would you do that? This is no different than sending a query - since you construct your aggregation in your application you just fill in the field you want with value/variable in your code. Is there any best practice document on MongoDB query API to have stable performance and utilize minimum resources? Yes, we have tips and tricks on optimizing performance by utilizing indexes, filters, and tools here . Does MongoDB support the use of multiple different indexes to meet the needs of a single query? Yes, this can be accomplished by the use of compound indexes. You can learn more about it in our docs here . If you work with big data and create a collection, is it smarter to create indexes first or after the collection is filled (regarding the time to create a collection)? It is better to create the indexes first as they will take less time to create if the collection is empty, but you still have an option to create the index once the data is there in the collection. There are multiple great benefits of MongoDB’s indexing capabilities: When building indexes, there is no impact on your app’s availability since the index operation is online. Flexibility to add and remove indexes at any time. Ability to hide indexes to evaluate the impact of removing them before officially dropping them. Where do I go to learn more? Here are some resources to help you get started: MongoDB Query API page MongoDB University MongoDB Docs You can also check out the webinar replay here .

May 7, 2021

How to Get Started with MongoDB Atlas and Confluent Cloud

Every year more and more applications are leveraging the public cloud and reaping the benefits of elastic scale and rapid provisioning. Forward-thinking companies such as MongoDB and Confluent have embraced this trend, building cloud-based solutions such as MongoDB Atlas and Confluent Cloud that work across all three major cloud providers. Companies across many industries have been leveraging Confluent and MongoDB to drive their businesses forward for years. From insurance providers gaining a customer-360 view for a personalized experience to global retail chains optimizing logistics with a real-time supply chain application, the connected technologies have made it easier to build applications with event-driven data requirements. The latest iteration of this technology partnership simplifies getting started with a cloud-first approach, ultimately improving developer’s productivity when building modern cloud-based applications with data in motion. Today, the MongoDB Atlas source and sink connectors are generally available within Confluent Cloud. With Confluent’s cloud-native service for Apache Kafka® and these fully managed connectors, setup of your MongoDB Atlas integration is simple. There is no need to install Kafka Connect or the MongoDB Connector for Apache Kafka, or to worry about scaling your deployment. All the infrastructure provisioning and management is taken care of for you, enabling you to focus on what brings you the most value — developing and releasing your applications rapidly. Let’s walk through a simple example of taking data from a MongoDB cluster in Virginia and writing it into a MongoDB cluster in Ireland. We will use a python application to write fictitious data into our source cluster. Step 1: Set up Confluent Cloud First, if you’ve not done so already, sign up for a free trial of Confluent Cloud . You can then use the Quick Start for Apache Kafka using Confluent Cloud tutorial to create a new Kafka cluster. Once the cluster is created, you need to enable egress IPs and copy the list of IP addresses. This list of IPs will be used as an IP Allow list in MongoDB Atlas. To locate this list, select “Custer Settings” and then the “Networking” tab. Keep this tab open for future reference: you will need to copy these IP addresses into the Atlas cluster in Step 2. Step 2: Set Up the Source MongoDB Atlas Cluster For a detailed guide on creating your own MongoDB Atlas cluster, see the Getting Started with Atlas tutorial. For the purposes of this article, we have created an M10 MongoDB Atlas cluster using the AWS cloud in the us-east-1 (Virginia) data center to be used as the source, and an M10 MongoDB Atlas cluster using the AWS cloud in the eu-west-1 (Ireland) data center to be used as the sink. Once your clusters are created, you will need to configure two settings in order to make a connection: database access and network access. Network Access You have two options for allowing secure network access from Confluent Cloud to MongoDB Atlas: You can use AWS PrivateLink, or you can secure the connection by allowing only specific IP connections from Confluent Cloud to your Atlas cluster. In this article, we cover securing via IPs. For information on setting up using PrivateLink, read the article Using the Fully Managed MongoDB Atlas Connector in a Secure Environment . To accept external connections in MongoDB Atlas via specific IP addresses, launch the “IP Access List” entry dialog under the Network Access menu. Here you add all the IP addresses that were listed in Confluent Cloud from Step 1. Once all the egress IPs from Confluent Cloud are added, you can configure the user account that will be used to connect from Confluent Cloud to MongoDB Atlas. Configure user authentication in the Database Access menu. Database Access You can authenticate to MongoDB Atlas using username/password, certificates, or AWS identity and access management (IAM) authentication methods. To create a username and password that will be used for connection from Confluent Cloud, select the “+ Add new Database User” option from the Database Access menu. Provide a username and password and make a note of this credential, because you will need it in Step 3 and Step 4 when you configure the MongoDB Atlas source and sink connectors in Confluent Cloud. Note: In this article we are creating one credential and using it for both the MongoDB Atlas source and MongoDB sink connectors. This is because both of the clusters used in this article are from the same Atlas project. Now that the Atlas cluster is created, the Confluent Cloud egress IPs are added to the MongoDB Atlas Allow list, and the database access credentials are defined, you are ready to configure the MongoDB Atlas source and MongoDB Atlas sink connectors in Confluent Cloud. Step 3: Configure the Atlas Source Now that you have two clusters up and running, you can configure the MongoDB Atlas connectors in Confluent Cloud. To do this, select “Connectors” from the menu, and type “MongoDB Atlas” in the Filters textbox. Note: When configuring MongoDB Atlas source And MongoDB Atlas sink, you will need the connection host name of your Atlas clusters. You can obtain this host name from the MongoDB connection string. An easy way to do this is by clicking on the "Connect" button for your cluster. This will launch the Connect dialog. You can choose any of the Connect options. For purposes of illustration, if you click on “Connect using MongoDB Compass.” you will see the following: The highlighted part in the above figure is the connection hostname you will use when configuring the source and sink connectors in Confluent Cloud. Configuring the MongoDB Atlas Source Connector Selecting “MongoDbAtlasSource” from the list of Confluent Cloud connectors presents you with several configuration options. The “Kafka Cluster credentials” choice is an API-based authentication that the connector will use for authentication with the Kafka broker. You can generate a new API key and secret by using the hyperlink. Recall that the connection host is obtained from the MongoDB connection string. Details on how to find this are described at the beginning of this section. The “Copy existing data” choice tells the connector upon initial startup to copy all the existing data in the source collection into the desired topic. Any changes to the data that occur during the copy process are applied once the copy is completed. By default, messages from the MongoDB source are sent to the Kafka topic as strings. The connector supports outputting messages in formats such as JSON and AVRO. Recall that the MongoDB source connector reads change stream data as events. Change stream event metadata is wrapped in the message sent to the Kafka topic. If you want just the message contents, you can set the “Publish full document only” output message to true. Note: For source connectors, the number of tasks will always be “1”: otherwise you will run the risk of duplicate data being written to the topic, because multiple workers would effectively be reading from the same change stream event stream. To scale the source, you could create multiple source connectors and define a pipeline that looks at only a portion of the collection. Currently this capability for defining a pipeline is not yet available in Confluent Cloud. Step 4: Generate Test Data At this point, you could run your python data generator application and start inserting data into the Stocks.StockData collection at your source. This will cause the connector to automatically create the topic “demo.Stocks.StockData.” To use the generator, git-clone the stockgenmongo folder in the above-referenced repository and launch the data generation as follows: python -c "< >" Where the MongoDB connection URL is the full connection string obtained from the Atlas source cluster. An example connection string is as follows: mongodb+srv:// Note: You might need to pip-install pymongo and dnspython first. If you do not wish to use this data generator, you will need to create the Kafka topic first before configuring the MongoDB Atlas sink. You can do this by using the Add a Topic dialog in the Topics tab of the Confluent Cloud administration portal. Step 5: Configuring the MongoDB Atlas Sink Selecting “MongoDB Atlas Sink” from the list of Confluent Cloud connectors will present you with several configuration options. After you pick the topic to source data from Kafka, you will be presented with additional configuration options. Because you chose to write your data in the source by using JSON, you need to select “JSON” in the input message format. The Kafka API key is an API key and secret used for connector authentication with Confluent Cloud. Recall that you obtain the connection host from the MongoDB connection string. Details on how to find this are described previously at the beginning of Step 3. The “Connection details” section allows you to define behavior such as creating a new document for every topic message or updating an existing document based upon a value in the message. These behaviors are known as document ID and write model strategies. For more information, check out the MongoDB Connector for Apache Kafka sink documentation . If order of the data in the sink collection is not important, you could spin up multiple tasks to gain an increase in write performance. Step 6: Verify Your Data Arrived at the Sink You can verify the data has arrived at the sink via the Atlas web interface. Navigate to the collection data via the Collections button. Now that your data is in Atlas, you can leverage many of the Atlas platform capabilities such as Atlas Search, Atlas Online Archive for easy data movement to low-cost storage, and MongoDB Charts for point-and-click data visualization. Here is a chart created in about one minute using the data generated from the sink cluster. Summary Apache Kafka and MongoDB help power many strategic business use cases, such as modernizing legacy monolithic systems, single views, batch processing, and event-driven architectures, to name a few. Today, Confluent and MongoDB Cloud and MongoDB Atlas provide fully managed solutions that enable you to focus on the business problem you are trying to solve versus spinning your tires in infrastructure configuration and maintenance. Register for our joint webinar to learn more!

May 6, 2021

Use x509 certificate-based authentication with MongoDB and Apache Kafka

Kafka has emerged as a popular event streaming platform. The inherent "pub/sub" model can be viewed as a method for moving data between systems. As such, MongoDB offers a Kafka connector , enabling Kafka topics to be copied into a MongoDB cluster (the sink). Similarly, the connector enables data movement from a MongoDB cluster (the source) into Kafka topics. To access data securely, certificate-based X.509 authentication is a natural choice for server-to-server authentication scenarios with Kafka and MongoDB. Certificates avoid having to store or manage usernames and passwords when used with database connection strings. For example, such user credentials could be inadvertently exposed if "hard-coded" in configuration files or other uses. An X.509 certificate is a structured, binary record. This record consists of several key and value pairs. X.509 certificates use the widely accepted international X.509 public key infrastructure (PKI) standard. The use of certificates prevents user credential exposure. Authentication requests with certificates verifies that any public key presented by a client or another member of the cluster belongs to that client or member. The X.509 certificate method for authentication is more secure than conventional password-based certification because each server machine needs their own dedicated key to participate in the cluster. For use with secure TLS/SSL connections, MongoDB supports X.509 certificate authentication allowing clients to use public key infrastructure in lieu of SCRAM (username and password). The certificate encodes two very important pieces of information: the server's public key and a digital signature that can be used to confirm the certificate's authenticity. Additionally, the certificate will include metadata used by the Certificate Authority to track the certificate and provide guidelines on how the public key can be used. Using the server's public key, the client and server are able to negotiate a shared symmetric key securely, which can be used to secure communications. Users can either generate their own certificates and keys (self-managed) or use the Atlas PKI. In either case, first a project-specific CA private and public key is generated, and then a per-user private key and signed X.509 identity certificate is created. If using self-managed X.509 infrastructure , you'll need to upload your CA public key certificate into your Atlas project. If using Atlas-managed X.509 infrastructure, you'll need to download the project private key and provide that to your Kafka Connect service. This signed certificate is then pushed to each server member in your Atlas cluster. The below diagram shows the deployment of a standard 3 node replica set and client using x.509 authentication: In non-production environments, the basic SCRAM authentication method may be most suitable. However, for production environments or server-server scenarios such as a Kafka-MongoDB integration, X.509 authentication is the recommended mechanism. To use X.509 certification for server-server authentication, first confirm that you are able to authenticate to an Atlas cluster using X.509 certificates. Then follow the steps below. Prerequisites: Openssl must be installed Project-level CA & user certificates created in PEM format If using Atlas-managed certificates, user-specific client certificate (see X.509 tab: ) If using self-managed X.509 auth, you will need to create & upload your CA public key to Atlas (see ), and have a user-specific client certificate ready Ensure that you have installed the MongoDB Kafka Connector and understand how to use it with Kafka Connect. Then follow these steps: Obtain the client user certificate from your system administrator (or from Atlas). In this example, the user certificate is stored in PEM file kafkaclient-X509-cert.pem and will be associated with the Atlas database user kafka-svc . Convert the PEM file to a password-protected PKCS12 formatted certificate by running this command: openssl pkcs12 -export -in kafkaclient-X509-cert.pem -out kafkaclient-X509-cert.p12 -password pass:mypassword Copy PKCS12 certificate ( kafkaclient-x509-cert.p12 ) to the server where Kafka Connect is running. Note the full path of the PKCS12 certificate location. Update the Kafka Connect configuration in the KAFKA_OPTS environment variable: export KAFKA_OPTS="<path to kafkaclient-x509-cert.p12>" Restart Kafka Connect Update the MongoDB Connector configuration to use a connection URI with the following parameter options: Connection.uri: "mongodb+srv://<mongodb-host>/test?authSource=%24external&authMechanism=MONGODB-X509&subjectName=kafka-svc" Re-deploy the MongoDB connector using the Kafka Connect REST API, with the above configuration for the connection URI. Download the latest MongoDB Connector for Apache Kafka 1.5 from the Confluent Hub ! Read the MongoDB Connector for Apache Kafka documentation . Questions/Need help with the connector? Ask the Community .

May 5, 2021

Built With MongoDB: Go

“Social media was supposed to augment our friendships and give us more to talk about — but it’s actually starting to replace our relationships,” laments Sean Conrad , the co-founder and CEO of Go. After 10 years of working at large tech companies and bootstrapping a multimillion-dollar gaming company, Sean started building Go , a social app focused on helping friends create plans to hang out in person. Combining data science, social networking, and event aggregation, Go provides users with a custom, curated feed of cool things to do and friends to do them with. Go is live in New Zealand and (very recently) Australia with over 40,000 downloads and 500 businesses. The startup has raised $6.7 million in seed funding and has been building with MongoDB from the start. For this edition of #BuiltWithMongoDB, we spoke with Sean about the business, being a second-time founder and CEO, and his experience with MongoDB. MongoDB: You actually started building during the COVID-19 pandemic. How did that impact the product, given that your mission is to bring people together in real life? Sean: It impacted us in so many ways. We researched the space throughout 2019, and started building the app in early 2020, planning for a fall release in Portland or Los Angeles. And then the pandemic hit the United States. We realized it was jokingly bad that we were building an app to bring people together just when social distancing was becoming a requirement. For a month, we contemplated a lot of possible ideas, and we had some cool ones, but our passion was really about making offline connections stronger. We spent the summer working on the product, and then launched in New Zealand because that country had handled the pandemic well and reopened. The product has been a huge success in New Zealand, and after iterating on it, we recently launched in Australia. Our plan is to launch in the United States, starting from Los Angeles, during the summer of 2021. MongoDB: You mentioned that you've used MongoDB before. What has your experience been like with MongoDB as a 2x founder? Sean: At my previous company, we scaled up to about 30 million downloads, and we ran it on MongoDB. We were not database experts, and it was very easy to use. It was 2013 when we started using MongoDB. We had our hiccups and had to learn what indexes were, but we became really comfortable with the platform. For Go, we picked MongoDB out of comfort. When we got started with Go, MongoDB Realm was still in beta. We would’ve used it had it been around, but we built our first product on Firebase Firestore. Firestore ended up being a bit limiting for us because we wanted to build a feed-based system (in Go, it’s showcasing a series of events or things to do that are interesting to you and your friends), so a lot of filters are necessary. That requires many different types of unstructured data that’s difficult to put into a simple schema. Managing these things demands a lot of documents and data duplication, and MongoDB was a good fit for that. We like that Atlas has full-text search built on Apache Lucene , which is a powerful text search library. We are just getting into that. In addition, most of our compute runs on AWS. We use a lot of containerized stuff on AWS, and a little bit of Lambda stuff, and we’re moving to a serverless environment. I’m not sure what the future of Go is, but I’m confident MongoDB will play a part in it. Our mobile app is written in Flutter, Google’s competitor to React Native. We like that quite a bit. MongoDB: What is the last technical podcast you enjoyed? Sean: It’s All About Widgets , a podcast about Flutter. We’ve got a really talented group of developers on our team — two of them are ranked in the top 15 Stack Overflow Flutter contributors! One of our developers Raouf Rahiche spoke on their second episode . It was really cool to hear a team member talking on this podcast. MongoDB: As a second-time founder, what is one thing that was unexpected for you in building this business? Sean: This is the first business in which I’ve raised funding, and I couldn’t have done it without my co-founder, Jesse Berns . For my last business, I started with something small with a few people, found product-market fit, and grew that. With Go, we started with a much more grand vision in mind, so it made sense to operate more like a traditional Silicon Valley startup, raising capital and growing the team quickly. With all startups, you’re operating with very few known facts, but when you raise money everything just tends to get bigger, faster, and I always say this is like ‘operating on hard mode’ — but in our case, it’s worth it. Our goal with Go is to help people manage their friendships in the same way that LinkedIn helps people manage their professional lives, and if we’re successful, that’ll entirely change how people make plans and optimize their friendships for more time together face-to-face. It’s built to inspire us to live our ideal lives, whether that’s basement art shows, unforgettable live music, lunch with friends at a special place that could only exist in your neighborhood, or a slow bike ride down by the river. It’s built for the mundane and the thrilling and everything in between. We’re at a really exciting moment in history where all the trends — adoption of mobile, the upcoming end to the pandemic — are going to enable a culture where people want to find humanity and joy in person, and human-facing tech is going to have a big impact in the next few years. With Go, we’re really excited to be part of that. Looking to build something cool? Get started with the MongoDB for Startups program.

May 5, 2021

Built With MongoDB: Buffer

I first became a fan of Buffer during graduate school. While managing social marketing for student clubs and conferences, I relied on Buffer to manage our fun marketing campaigns. Buffer is a popular social media software that enables small businesses and content creators to plan, publish, and analyze marketing campaigns across social channels. It serves 67,536 customers across over 85 countries. The company has over $21M annual recurring revenue and has been in business for 10 years now. I recently had the opportunity to speak with Dan Farrelly , Buffer’s CTO, about the fast-growing company, his experience with MongoDB for Startups , and the challenges of growing into a CTO position. MongoDB: Let’s go back to February 2014. At that time, Buffer was a much smaller company — only about 15 people, compared with the more than 80 people now. What drew you to join? Dan: Hands down, the culture. There were two things that were unique about Buffer at that time: First, it was an entirely remote team. This was rare in the pre-pandemic world. Second, there was incredible transparency both inside and outside the org. The company was so open about salary that on the Buffer Jobs page, it had an estimated salary calculator based on role and experience. Internally, all revenue numbers and company metrics were accessible to the entire team. The executives being an open book enabled trust and free communication across the organization. And like any startup, we were all-in. Early on, I remember being at a taco shop on a Friday evening when the then-CTO texted me that the servers were crashing. I opened up my laptop at the restaurant and just started troubleshooting — doing whatever I could to try to mitigate the issue. Many people depended on us to manage their social identities, and so with a taco in one hand, and a phone on the other, we figured it out. Working at a startup is such an incredible learning curve; you have to be scrappy, push the boundaries, and find creative ways to deliver results. MongoDB: Why did the team decide to build with MongoDB? Dan: Our culture has always been engineering-centric, focused on shipping code as soon as it’s ready for production. We encourage continuous delivery of our applications. MongoDB’s products resonate with that lean culture. MongoDB doesn’t require schema migrations; the flexibility and ease of use enabled us to practice the type of engineering we wanted. MongoDB became our partner in being fast and delivering often. An additional benefit was the ability to scale easily: one type of application we were building (content scheduling for social media) had massive collection of data that had to be scheduled which required very high throughput — we were posting hundreds of thousands of times a day for social media accounts. MongoDB Atlas allowed us to scale and ensure we didn’t have to worry about our database over the years. MongoDB: Had you used MongoDB before joining Buffer? Dan: I had taken a MongoDB University course in 2012 focused on MongoDB for Node.js developers, and I had built a few side projects and prototypes with MongoDB. The course itself was fantastic: it not only talked about basic things such as setting up replication, sharding, and how the database itself works, but it also talked about some of the more complex elements (how drivers work, write concern, and fully leveraging the database). But the best way to learn about MongoDB was putting out fires at Buffer. Early on, we had monitoring and scaling issues, not with the database but with the code, and our team had to get smart about diagnosing specific issues in our application. MongoDB: What advice do you have for an engineer who wants to grow into a CTO position someday? Dan: Engineers can pursue their own roles and do a really good job while still having a limited perspective of the company. In order to become a CTO, you really need to broaden that perspective, and understand how technical strategy supports business goals. The CTO doesn’t have to be the most technical person on the team, but has to have a well-rounded view of the business and also effectively communicate across the stack. Transparency at Buffer helped me develop a wider perspective of the business. If you have ambitions to grow into a CTO role, build relationships across the organization — on the technical and business sides — and think strategically about how the code you ship drives business metrics. Looking to build something cool? Get started with the MongoDB for Startups program.

April 28, 2021

Reducing Queue Times by Using Speculative Execution

When solving concurrency problems in software, the simplest solution is often to make the trickiest part of the problem serial. Here at MongoDB, this is exactly the approach we took to implement a commit queue, where engineers submit code changes to be tested and then merged into a repository. This worked well for many smaller repositories, but for large ones such as the MongoDB Server , testing submissions one at a time proved to be too slow, with engineers sometimes waiting hours for their code to finally make it into the repository. To solve this challenge, we introduced some speculative execution on top of our original approach, which reduced the wait time for a typical week by 62%. Background Many of the engineers at MongoDB submit their code changes to a commit queue, which runs a basic set of tests on these changes before merging them to the correct repository. The main difference between the commit queue running the tests and an engineer running the tests is that the commit queue tests with the latest changes to the code base, whereas an engineer has checked out the code base at some point in the past. To ensure that it has the latest changes, the commit queue tests only one set of changes at a time before either merging the changes if the tests pass, or rejecting them and notifying the author if the tests fail. This serial approach makes the system easy to understand, but it also presents an optimization opportunity to reduce the time spent waiting for tests to start. Design Approach Parallelization The only part of this system we needed to keep serial was the part that merged changes into the repository, because this ensures that changes would be merged in the same order in which they were submitted. By far, the slowest part of the commit queue is actually running the tests, and this is the work that we wanted to split among multiple machines. Let’s assume as an example that all submissions to the commit queue take 10 minutes to run. Let’s also assume that in one day there are 30 submissions to the commit queue at roughly the same time. With the previous requirement that the queue runs serially, this means it would take 300 minutes to get through all the submissions. If we parallelize testing the submissions among 30 machines, it would take only 10 minutes of actual time from when the last change was submitted to get through all the submissions. Speculative Execution With a serial queue, each successful submission checks out the latest code in the repository, applies its changes, runs tests, then commits the code back to the repository before the next submission starts. If we do these steps in parallel however, checking out the latest code in the repository will not include the changes from submissions that would have merged before the one being tested. Parallelizing our tests requires some extra steps to ensure that submissions run tests with the code changes from prior submissions. In order to know what code changes should be applied to which tests, the commit queue must still maintain the concept of an order for each submission. That way, the third entry in the queue will know that it must apply the changes from the first and second entries, in addition to its own code changes. If any test for a submission fails, it’s rejected from the queue and any submissions after it are rerun without the code changes from the one that failed. If all tests for a submission finish running, the submission will wait to be merged until the one immediately in front of it is merged. Performance Considerations Testing with merged code changes like this requires that most of the tests pass; otherwise the system will do a lot more work than it would have done if it tested submissions one at a time, and we lose all the benefits from parallelism. In the worst-case scenario where nothing passes, the nth submission in the queue would need to be restarted each time something in front of it fails, leading to total times that any submission is run. This means that if engineers add 10 submissions to the commit queue, the new parallel approach runs tests as many as 55 times, whereas with the old serial approach the tests would always run 10 times. Maybe this worst-case scenario isn’t a big deal if the majority of submissions pass (and at MongoDB, 85% of them do). However we’d like to guarantee that an unusually bad day doesn’t make the machines running the tests do an excessive amount of unneeded work. To make this guarantee possible, we inserted a checkpoint into the queue, so that only the batch of submissions in front of the checkpoint are running tests. In the example of 10 submissions to the queue, placing the checkpoint after submission No. 3 would mean that the first three submissions start running tests while submissions No. 4 and later wait until the first three finish. It’s totally possible that everything still fails, but adding this checkpoint prevents us from doing too much extra work. With the checkpoint, a queue of length n would run: total submissions, where f is the position of the checkpoint, \ is the integer division operator, and % is the modulus operator. If engineers add 10 submissions to the queue and the checkpoint is after submission No. 3, this hybrid approach would run tests as many as 19 times, compared with 55 with a fully parallel and 10 with a fully serial approach. The following infographic helps visualize this example. Colors represent the current status of the submission: green means successful, red means failed, yellow means in progress, and gray means not yet started. Results The graph below depicts the average length of time a submission would wait before it started running tests for a representative week when processing submissions serially. Contrast these times with the graph below, which shows a representative week with the hybrid approach. For the depicted weeks, the overall average time dropped from 1,238 seconds with the serial approach to 469 seconds with the hybrid approach — a reduction of 62%. Conclusion With this hybrid approach of parallelizing the longest-running parts of the system but keeping key parts serial, we were able to reap the benefits of each approach. We saw drastic reductions in wait times while still maintaining the concept of an ordering for our commit queue. What led us to this approach were the requirements that the result should be noticeably faster for typical sizes of the problem (a queue with one to nine submissions), but could not be drastically slower in the worst-case scenario. These two guiding principles will often yield to designs that work well in real-world scenarios, even though they may not handle all edge cases gracefully.

April 27, 2021

Built With MongoDB: Queenly

Queenly founders Trisha Bantigue and Kathy Zhou grew up in low-income immigrant families, trying to balance their cultural upbringing with their desire to fit into their American lifestyle. To earn scholarships to pay for college, they both started participating in beauty pageants. “Beauty pageants provide young women with the opportunity to kickstart their careers,” says Queenly Co-founder & CTO Kathy. “And one of the core parts of the pageant system is the evening gown — that was the spark of inspiration for us wanting to tackle the whole fashion industry.” According to Kathy, women in America — especially outside the coastal cities — end up attending around 15 special occasions a year. “Whether it’s prom, beauty pageants, or other formal occasions, women need cost-effective formal outfits,” she says. After working across leading Silicon Valley companies, Trisha and Kathy teamed up to build Queenly , a marketplace and search engine for the formalwear industry. “No one has created a robust search engine for formal dresses,” says Kathy. “People are picky about formal attire — there’s so much consideration that goes into it, from neckline to hemline, silhouettes, colors, and fabrics. We’re trying to build a marketplace, do complex queries, and provide personalized recommendations.” Queenly has 80,000 registered users and 50,000 dresses listed. The team of five ( which is hiring! ) is backed by Y-Combinator. We recently sat down with Kathy to learn more about how her pageant experience has informed her career, her experience with MongoDB, and the challenges in building a formalwear business. MongoDB: What was your first pageant experience like? Kathy: It was really eye-opening: I have always been a shy person, and my number one fear is public speaking. What people don’t realize about pageants is that along with having to learn how to dress well, you also have to be able to speak well. You have to learn to speak from your heart and to communicate well. Gaining the confidence and soft skills to answer those pageant questions has also helped me in my career, helping me grow from an engineer to engineering leadership. One of the most memorable questions from an early pageant was about what’s the most important thing you want to do in your pageant regime. I talked about how it’s okay for young women to both be nerdy and girly — you should be able to embrace all these different sides of yourself, and not fear falling into one box of being. I wish someone had told me that when I was younger. Now, I’m honored to be able to embrace both sides as a CTO, a Y-Combinator female founder, and a beauty pageant contestant. MongoDB: Building a two-sided marketplace is a challenge. What did the minimum viable product look like? Kathy: The MVP was very rough — I started by coding an iOS app part-time and during the weekends while I was still employed at Pinterest. The goal was to tackle the supply-side of the marketplace first to get people to upload dresses, so I optimized for creating a really easy dress-upload experience. You could only search for one size and one color at a time. Now, we’re using natural language processing query for search, and also a larger combination of different dress-type attributes. We’re also including reverse image search, and I’ve been working on tailored user recommendations. MongoDB: How did you make decisions for your technical back end? Kathy: Initially, we had very basic search and exploration using Google’s Firebase. It was very easy to set up and has a fairly good UI tooling, but its query capacity was something we were quickly outgrowing. At our stage of company, non-relational storages are a really great decision for the sake of speed and adaptability. As we’re working towards product-market fit, we need to move quickly in launching new user experiences and reworking old ones, so it’s important to have that flexibility in restructuring and reshaping our data. That’s when I went to MongoDB and realized that it was a really quick migration and had all the capacities and flexibility we need. MongoDB is great for JavaScript developers. I started with a background in front end, with a foundation in HTML / CSS and JavaScript, and it was very easy to pick up MongoDB. It’s also going to help a lot of emerging developers, and those coming out of coding bootcamps, get started on the back end more quickly. As people say, we were building the airplane as we were flying. We needed to move fast so people could access and search for dresses quickly. Many of our users are women who live in the Midwest and the South where they may not have amazing internet access, so speed and performance are pretty important. MongoDB: Are there specific features of MongoDB you're using, aside from Atlas? Kathy: The most important aspects are the core functionality and the monitoring toolings and dashboards. Those are useful and come right out of the box. I’ve been meaning to take a look at search capabilities — I think it’s cool that there are indexes right out of the box. We’re trying to adapt our product as it goes, and figure out how to tag and enable different attributes on a dress. MongoDB: What was the last good technical book or article you read? Kathy: I really enjoy reading the Towards Data Science publication on Medium. They do a good job of covering different use cases as well as making different fields algorithms and data science/machine learning concepts more approachable. Beyond that, I read several fashion magazines and pageant blogs because I think CTOs — and the technical side of the business — should really understand the users. I try to keep up with trends in fashion and retail to better understand the opportunity, and use that to influence how our product functions. Looking to build something cool? Get started with the MongoDB for Startups program.

April 20, 2021

Ready to get Started with MongoDB Atlas?

Start Free