Articles, announcements, news, updates and more
MongoDB is Going on a World Tour
March 9, 2023
Visualizing Your MongoDB Atlas Data with Atlas Charts
MongoDB Atlas is the leading multi-cloud developer data platform. We see some of the world’s largest companies in manufacturing , healthcare , telecommunications , and financial services all build their businesses with Atlas at their foundation. Every company comes to MongoDB with a need to safely store operational data. But all companies also have a need to analyze data to gain insights into their business and data visualization is core to establishing that real-time business visibility. Data visualization enables the insights required to take action, whether that’s on key sales data, production and operations data, or product usage to improve your applications. The best way to do this as an Atlas user is by using Atlas Charts – MongoDB’s first-class data visualization tool, built natively into MongoDB Atlas. Why choose Charts First, Charts is natively built for the document model. If you’re familiar with MongoDB, you should be familiar with documents. The document model is a data model made for the way developers think. And with Charts, you can take your data from documents and collections in Atlas, and visualize them with no ETL, data movement or duplication. This speeds up your ability to discover insights. Second, Charts supports all cluster configurations you can create in Atlas, including dedicated clusters, serverless instances, data stored in Online Archive, as well as federated data in Atlas Data Federation. Typically when you learn about a company’s integrated products and services, you find some “gotchas” or limitations that make any benefits come at a significant cost. In the case of a MongoDB Atlas customer, that could come in the form of someone finding out that a cluster configuration option isn’t supported by Charts. But that will never be the case. If you create and manage your application data in Atlas, you can visualize it in Charts. That’s it. Third, Charts is a robust data visualization tool with a variety of chart types, extensive customization options, and interactivity. Compared to other options in the business intelligence market, you get the same key benefits, without all the complexity. You can learn how to use Charts in a few hours and you can easily teach your team. It’s the simplest data visualization solution for most teams. Fourth, the value of Charts can extend beyond individual use cases, with sharing and embedding . This lets you both flexibly share charts and dashboards with your team, as well as embed them into contexts that matter most to your data consumers, such as in a blog post or inside your company’s wiki. Finally, Charts is free for Atlas users up to 1GB per project per month, which covers moderate usage for most teams. There are no seat-based licensing fees associated with Charts, so no matter how many team members you have, Charts will remain a low-cost, if not zero cost solution for your data visualization needs. Beyond the included free usage, it’s just $1/GB transferred per month. You can check out more pricing details here . How to use Charts The best way to learn how to use Charts is to simply give it a try. It’s free to use and we have a variety of sample dashboards you can use to get started. But let’s walk through some basics to help illustrate the kinds of visualizations that Charts can enable. Charts makes visualizing your data easy by automatically making your Atlas deployments (any cluster configuration) available for visualization. If you’re a project owner, you can manage permissions to data sources in Charts. We could write an entire blog post on data sources, but if you’re just getting started, just know that your data is made easily available in Charts unless your project owner intentionally hides it. Create a dashboard Everything in Charts starts with a dashboard and creating a dashboard is easy. Simply select the Add Dashboard button at the top right of the Charts page in Atlas . From there, you’ll fill in some basic information like a title and optional description, and you’re on your way. Here’s what one of our new sample dashboards looks like. They are a great place to start: Build a chart Once you have a dashboard created, you can add your first chart. The chart builder gives you a simple and powerful drag and drop interface to help you quickly construct charts. The first step is selecting your data source: Once you have a data source selected, simply add desired fields into your chart and start customizing. The example below uses our IoT sample dashboard dataset to create a bar chart displaying the total distance traveled by different users. From there you can add filters and further customize your chart by adding custom colors, data labels, and more. The chart builder even allows you to write, save, and share queries and aggregation pipelines as shown below. You can learn more in our documentation. Play around with the chart builder to get familiar with all of its functionality. Share and embed A chart can be useful in itself to individual users, but we see users get the most benefit out of Charts when sharing visualizations with others. Once you have created a dashboard with one or more charts, we offer a variety of options letting you share your dashboards with your team, your organization, or via a public link if your data is not sensitive. If you would rather embed a chart or dashboard where your team is already consuming information, check out Charts embedding functionality. Charts lets you embed a chart or dashboard via iframe or SDK, depending on your use case. Check out our embedding documentation to learn more. That was just a brief overview of how to build your first charts and dashboards in Atlas Charts, but there’s a lot more functionality to explore. For a full walkthrough, watch our product demo here: Atlas Charts is the only native data visualization tool built for the document model and it’s the quickest and easiest way to get started visualizing data from Atlas. We hope this introduction helps you get started using Charts to gain greater visibility into your application data, helping you to make better decisions on your data. Get started with Atlas Charts today by logging into or signing up for MongoDB Atlas , deploying or selecting a cluster, and navigating to the Charts tab to activate for free.
Why MongoDB’s Partner Team is Focused like a Laser, Not a Flashlight
Four years ago, I wrote an article about how our Partner and Sales teams work together to ensure success. Since then, our Partner organization has grown five times in size and become even more of a competitive differentiator for MongoDB. As we continue to build lasting relationships with our partners and become even more strategic in how we leverage our partnerships, I’m reflecting on how far the Partner organization has come and where we’re headed. The Partner organization is the x-factor for MongoDB It starts with the customers, but more specifically, developers. Developers are creating some of the most innovative and modern applications with MongoDB, but our developer data platform is only one component of their tech stack. That’s why it’s essential to have an ecosystem of companies who help developers write or modernize their software faster. For MongoDB, this could be system integrators, cloud providers, ISVs who embed MongoDB into their products, technology partners who want to integrate with us, or resellers who enable us to sell MongoDB in new markets and regions. Most companies have a strategy for each and a team that manages these relationships, but there are a few things that make MongoDB’s Partner organization different. First, the people we hire. We look for individuals who have a sales-first mentality, are willing and able to generate pipeline, and can position the value of MongoDB. It’s extremely important for our Partner team to show ROI to our Sales teams, and I’d argue that if your Partner organization can’t do that, you might not need them. As part of the Partner team at MongoDB, you have the opportunity to master your sales skills and be rewarded for your success in finding new partnerships. One of our core MongoDB values is “Own What You Do” and it’s embodied every day on the Partner team. We demand excellence from ourselves. We take accountability for our actions and our success. We are empowered to make things happen. The second thing that sets MongoDB apart is that we manage partnerships like a laser, not a flashlight. We do not measure success by the number of partners we have. We prefer to deeply invest resources in a handful of alliances while we create an ecosystem funnel to drive the next wave of investments. We look for partnerships with organizations that our customers have told us they’d like us to work better with. Though we have over 1,000 partners, we put most of our horsepower into the top 50 based on this feedback. Lastly, the opportunity at MongoDB is enormous. If you are looking to work with a product that people love, and you believe there is an opportunity to be well-compensated for selling and building full solutions around a product, you’ll find that at MongoDB. Driving focus via the Partner Specialist teams At the beginning of this year, we created dedicated specialist teams for Cloud, System Integrator, ISV, VAR, and Tech partners. Customers have told us time and time again that they wanted us to become more intimate with their use cases and the associated ecosystem, and we listened. For example, we now have specialized teams for each cloud partner who know their products inside out and focus on strengthening the relationship by sourcing new opportunities for our sales force. This isn’t something you find in most Partner organizations, as it’s more common for teams to be generalists opposed to specialists. We began experimenting with specialization in 2021, and a highlight of this specialization is our partnership with Amazon Web Services (AWS). In the past, MongoDB and AWS were viewed as competitors rather than partners. In 2021, both sides realized that it’s better to work together and decided to dedicate individuals to build a partnership that has since resulted in an incredible number of co-sell wins. AWS has leaned into MongoDB and continues to position MongoDB Atlas as a preferred database for customers. This puts MongoDB as one of the top three data partners that AWS has globally, and AWS is now MongoDB’s largest partnership in the world . Scaling without diluting impact MongoDB’s Partner organization has quintupled in size since 2019. We have partners in almost every major location around the world and teams who provide regional coverage. With the ROI we’ve seen from specialization, we’ve invested in more specialists and therefore can provide more dedicated resources to each partner. MongoDB’s Partner organization is known as a place with a winning culture where people consistently deliver results. We’ve had many internal transfers from employees who joined MongoDB in Sales, Sales Development, or Marketing and decided to transition into a role on the Partner team. Similarly, our team is focused on providing opportunities for growth. The number of individuals who joined the Partner team as individual contributors and have since been promoted into Director and VP roles is extraordinary. For example, our VP of System Integrator Partner Specialists, Global Lead of Accenture Partner Specialists, RVP of Capgemini Partner Specialists, RVP of Cloud Programs, Global Lead of AWS Partner Specialists, and RVP of Azure Partner Specialists all began their careers as individual contributors here at MongoDB. As we grow our Partner organization, diversity of background, thought, and experiences will continue to be a key differentiator for us. We value different perspectives and view diversity as a way to better serve our customers. Diversity drives a culture of innovation and investing in inclusion helps us serve customers in all markets, giving us a competitive advantage. The future of MongoDB's Partner organization I’m very excited about our coming year. We continue to look for the next partnership to break records with. Whether it's Alibaba , IBM, Databricks , Carahsoft, Microsoft, or Google , working with partners to find new workloads is key to MongoB’s success. MongoDB plans to continue to invest directly in partners via MongoDB ventures as part of this strategy. We also take great pride in promoting folks into leadership positions and we expect even more of that in the year ahead. Our leaders and I live by one of John McMahon’s mottos: "Too many companies think culture is ping-pong, foosball, and beer taps. Helping people win is a culture. Teaching them how to win on their own is a culture. If people aren’t learning, earning, growing, and being promoted, they’re not staying around for the pool table.” This is why we hope you are interested in joining us. We have great products, specialized partnerships, and most importantly, a winning team of fantastic leaders. Want to be part of a team that takes ownership and makes their work matter? View our open roles today .
How Much is Your Data Model Costing Your Business?
Economic volatility is creating an unpredictable business climate, forcing organizations to stretch their dollars further and do more with less. Investments are under the microscope, and managers are looking to wring every ounce of productivity out of existing resources. IT spend is a concern and many IT decision-makers aren't sure what's driving costs. Is it overprovisioning? Cloud sprawl? Shadow IT? One area that doesn't get a lot of attention is how the data is modeled in the database. That's unfortunate because data modeling can have a major impact in terms of the cost of database operations, the instance size necessary to handle workloads, and the work required to develop and maintain applications. Pareto patterns Data access patterns are often an illustration of the Pareto Principle at work, where the majority of effects are driven by a minority of causes. Modern OLTP applications tend to work with data in small chunks. The vast majority of data access patterns (the way applications access and use data) work with either a single row of data or a range of rows from a single table. At least that's what we found at Amazon , looking at 10,000 services across all the various RDBMS based services we deployed. Normalized data models are quite efficient for these simple single table queries, but the less frequent complex patterns require the database to join tables to produce a result, exposing RDBMS inefficiencies. The high time complexity associated with these queries meant significantly more infrastructure was required to support them. The relational database hides much of this overhead behind the scenes. When you send a query to a relational database, you don't actually see all the connections opening up on all the tables, or all the objects merging. Even though 90% of the access patterns at Amazon were for simple things, the 10% that were doing more complex things were burning through CPU to the point that my team estimated they were driving ~50% of infrastructure cost. This is where NoSQL data modeling can be a game-changer. NoSQL data models are designed to eliminate expensive joins, reduce CPU utilization, and save on compute costs. Modeling for efficiency in NoSQL There are two fundamental approaches to modeling relational data in NoSQL databases: Embedded Document - All related data is stored in a single rich document which can be efficiently retrieved when needed. Single Collection - Related data is split out into multiple documents to efficiently support access patterns that require subsets of a larger relational structure. Related documents are stored in a common collection and contain attributes that can be indexed to support queries for various groupings of related documents. The key to building an efficient NoSQL data model and reducing compute costs is using the workload to influence the choice of data model. For example, a read-heavy workload like a product catalog that runs queries like, "get all the data for a product" or "get all the products in a category," will benefit from an embedded document model because it avoids overhead of reading multiple documents. On the other hand, a write-heavy workload where writes are updating bits and pieces of a larger relational structure would run more efficiently with smaller documents stored in a single collection which can be accessed independently and indexed to support efficient retrieval when all the data is needed. The final choice depends on the frequency and nature of the write patterns and whether or not there's a high velocity read pattern that's operating concurrently. If your workload is read-intensive, you want to get as much as you can in one read. For a write-intensive workload, you don't want to have to rewrite the full document every time it changes. Joins increase time complexity. In NoSQL databases, depending on the access pattern mix, all the rows from the relational tables are stored either in a single embedded document or as multiple documents in one collection that are linked together by indexes. Storing multiple related documents in a common collection means there is no need for joins. As long as you're indexing on a common dimension across documents, you can query for related documents very efficiently. Now imagine a query that joins three tables in a relational database and your machine needs to do 1,000 of them. You would need to read at least 3,000 objects from multiple tables in order to satisfy the 1,000 queries. With the document model, by embedding all the related data in one document, the query would read only 1,000 objects from a single collection. Machine wise, having to merge 3,000 objects from three tables versus reading 1,000 from one collection will require a more powerful and expensive instance. With relational databases, you don't have as much control. Some queries may result in a lot of joins, resulting in higher time complexity which translates directly into more infrastructure required to support the workload. Mitigate what matters In a NoSQL database, you want to model data for the highest efficiency where it hurts the most in terms of cost. Analytical queries tend to be low frequency. It doesn't matter as much if they come back in 100 ms or 10 ms. You just want to get an answer. For things that run once an hour, once a day, or once a week, it's okay if they're not as efficient as they might be in a normalized relational database. Transactional workloads that are running thousands of transactions a second need to process as efficiently as possible because the potential savings are far greater. Some users try to practice these data modeling techniques to increase efficiency in RDBMS platforms since most now support document structures similar to MongoDB. This might work for a small subset of workloads. But columnar storage is designed for relatively small rows that are the same size. They do work well for small documents, but when you start to increase the size of the row in a relational database, it requires off-row storage. In Postgres this is called TOAST (The Oversized-Attribute Storage Technique). This circumvents the size limit by putting the data in two places, but it also decreases performance in the process. The row based storage engines used by modern RDBMS platforms were not designed for large documents, and there is no way to configure them to store large documents efficiently. Drawing out the relationship The first step we recommend when modeling data is to characterize the workload by asking a few key questions: What is the nature of the workload? What is the entity relationship diagram (ERD)? What are the access patterns? What is the velocity of each pattern? Where are the most important queries that we need to optimize? Identifying the entities and their relationships to each other is going to form the basis of our data model. Once this is done we can begin to distill the access patterns. If it's a read heavy workload like the product catalog you'll most likely be working with large objects, which is fine. There are plenty of use cases for that. However, if you're working with more complex access patterns where you're accessing or updating small pieces of a larger relational structure independently, you will want the data separated into smaller documents so you can efficiently execute those high velocity updates. We teach many of these techniques in our MongoDB University course, M320: MongoDB Data Modeling . Working with indexes Using indexes for high-frequency patterns will give you the best performance. Without an index, you have to read every document in the collection and examine it to determine which documents match the query conditions. An index is a B-tree structure that can be parsed quickly to identify documents that match conditions on the indexed attributes specified by the query. You may choose to not index uncommon patterns for various reasons. All indexes incur cost as they must be updated whenever a document is changed. You might have a high velocity write pattern that runs consistently and a low velocity read that happens at the end of the day, in which case you'll accept the higher cost of the full collection scan for the read query rather than incur the cost of updating the index on every write. If you are writing to a collection 1,000 times a second and reading once a day, the last thing you want to do is add an index update for every single write just to make the read efficient. Again, it depends on the workload. Indexes in general should be created for high-velocity patterns, and your most frequent access patterns should be covered by indexes to some extent, either partially or fully. Remember that an index still incurs cost even if you don't read it very much or at all. Always make sure when you define an index that there is a good reason for it, and that good reason should be that you have a high frequency access pattern that needs to use it to be able to read the data efficiently. Data modeling and developer productivity Even after you've optimized your data model, cost savings will continue to accrue downstream as developers find that they can develop, iterate, and maintain systems far more efficiently than in a relational database. Specific document design patterns and characteristics of NoSQL can reduce maintenance overhead and in many cases eliminate maintenance tasks altogether. For example, document databases like MongoDB support flexible schema which eliminates the need for maintenance windows related to schema migrations and refactoring of a catalog as with RDBMS. A schema change in a relational database almost always impacts ORM data adapters that would need to be refactored to accommodate the change. That's a significant amount of code maintenance for developers. With a NoSQL database like MongoDB, there's no need for cumbersome and fragile ORM abstraction layers. Developers can store object data in its native form instead of having to normalize it for a tabular model. Updating data objects in MongoDB requires almost zero maintenance. The application just needs to be aware documents may have new properties, and how to update them to the current schema version if they don’t. MongoDB will lower license fees and infrastructure costs significantly, but possibly the biggest savings organizations experience from moving away from RDBMS will come from reduced development costs. Not only is there less code overall to maintain, but the application will also be easier to understand for someone who didn't write the code. MongoDB makes migrations far simpler and less prone to failure and downtime. Applications can be updated more frequently, in an easier fashion, and without stressing about whether a schema update will fail and require a rollback. Overall, maintaining applications over their lifetime is far easier with NoSQL databases like MongoDB. These efficiencies add up to significant savings over time. It's also worth mentioning that a lot of up-and-coming developers see relational databases as legacy technology and not technology they prefer to use. With MongoDB it is easier to attract top talent, a critical factor in any organization's ability to develop best-of-breed products and accelerate time-to-value. Uplevel your NoSQL data modeling skills If you want to start reining in the hidden costs in your software development lifecycle by learning how to model data, MongoDB University offers a special course, M320: MongoDB Data Modeling . There are also dozens of other free courses, self-paced video lessons, on-demand labs, and certifications with digital badges to help you master all aspects of developing with MongoDB.
Digital Payments - Latin America Focus
Pushed by new technologies and global trends, the digital payments market is flourishing all around the world. With a valuation at over USD 68 billion in 2021 and expectations to grow to double digits over the next decade, emerging markets are leading the way in terms of relative expansion. A landscape once dominated by incumbents - big banks and credit card companies - is now being attacked by disruptors that are interested in capturing a market share. According to a McKinsey study , there are four major factors at the core of this transformation: Pandemic-induced cashless payments adoption E-commerce Government push for digital payments Fintechs Interestingly, the pandemic has been a big catalyst in the rise of financial inclusion by encouraging alternative means of payment and new ways of borrowing and saving. These new digital services are in fact easier to access and to consume. In Latin America and the Caribbean (LAC), Covid spurred a dramatic increase in cashless payments, 40% of adults made an online purchase, 14% of which did it for the first time in their life. E-commerce has experienced a stellar growth, with a penetration that will likely exceed 70% of the population in 2022, domestic and global players including Mercado Libre and Falabella are pushing digital payment innovation to provide an ever smoother customer experience on their platforms. Central banks are promoting new infrastructure for near real-time payments, with the goal of providing a cheaper and faster technology for money transfer both for citizens and businesses. PIX is probably the biggest success story. An instant payment platform developed by Banco Central do Brasil (Brazil Central Bank), it began operating in November 2020, and within 18 months, over 75% of adult Brazilians had used it at least once. The network processes around $250 Billion in annualized payments, about 20% of total customer spend. Users (including self employed workers) can send and receive real-time payments through a simple interface, 24/7 and free of charge. Businesses have to pay a small fee. In the United States, the Federal Reserve has announced it will be launching FedNow in mid 2023, a payment network with characteristics similar to PIX. These initiatives aim to solve issues such as slow settlements and low interoperability between parties Incumbent banks still own the lion’s share of the digital payment market, however, fintechs have been threatening this dominance by leveraging their agility to execute fast and cater to customer needs in innovative and creative ways. Without the burden of legacy systems to weigh them down, or business models tied to old payment rails, fintechs have been enthusiastic testers and adopters of new technologies and payment networks. Their mobile and digital first approach is helping them capture and retain the younger segment of the market, which expect integrated real-time experiences they can consume at the touch of a button. An example is Paggo, a Guatemalan fintech that helps businesses streamline payments by enabling them to share a simple QR code that customers can scan to transfer money. The payment landscape is not only affected by external forces, changes coming from within the industry are also reshaping the customer experience and enabling new services: ISO 20022 is a flexible standard for data interchange that is being adopted by most financial industry institutions to standardize the way they communicate between each other, thus streamlining interoperability. Thanks to the adoption of ISO 20022, it’s more straightforward for banks to read and process messages, this translates into smoother internal processes and easier automatization. For end users this means faster and potentially cheaper payments, as well as richer and more integrated financial apps. 3DS2 is being embraced by the credit and debit card payments ecosystem. It essentially is a payment authentication solution that serves online shopping transactions. Similarly to ISO 20022, the end user won’t even be aware of the underlying technology, but will only experience a smoother and frictionless checkout. 3DS2 avoids the user being redirected to their banking app for confirmation when buying an item online, now it’s all happening on the website or app of the seller. This is all done while also enhancing fraud detection and prevention; this new solution makes it harder to use one’s credit or debit card without authorization. 3DS2 adoption benefit is twofold: on the one hand the user has increased confidence, on the other hand merchants are happier because of a lower customer abandonment rate, in fact fear of fraud at checkout is usually one of the main reasons for ditching an online purchase. This solution is especially beneficial for the LAC region, where, despite wide adoption of e-commerce, people are still reluctant to transact online. One of the factors contributing to this oddity is fear of fraud, Cybersource reported that in 2019, a fifth of e-commerce transactions were flagged as potentially fraudulent and 20% were blocked, that’s over 6 times the global average. It is evident how online shoppers’ trust will be encouraged by the platforms’ adoption of 3DS2. It is worth also mentioning the role played by blockchain and cryptocurrencies. Networks such as Ethereum or Lightning are effectively a decentralized alternative to the more traditional payment rails. Over the last few years more and more people have started to use this technology because of its unique features: low fees, fast processing time and global reach. Latin America has seen an explosion in adoption due to several factors, remittances and stablecoin payments being highly prominent. Traditional remittance service providers are in fact slower and more expensive than blockchain networks. Especially in Argentina, an increasing number of autonomous workers are demanding to be paid in USDC or USDT, two stablecoins pegged to the value of the dollar, thus being able to stave off inflation. It is clear that the payment landscape is rapidly evolving, on the one end customers expect products and services that integrate seamlessly with every aspect of their digital lives. Whenever an app is perceived as slow, poorly designed or simply missing some features, the user can easily switch to a competitor’s alternative. On the other hand, the number of players contending for their share in the digital payments market is expanding, driving down margins of traditional products. The only way to successfully navigate this complex environment is investing in innovation and in creating new business models. There’s no unique approach to face such challenges, but there’s no doubt that every successful business needs to harness the power of data and technology to provide its customers with the personalized and real-time experience they demand. We at MongoDB believe that a solid foundation to achieve that is represented by a highly flexible and scalable developer data platform, allowing companies to innovate faster and better monetize their payment data. Visit our Financial Services web page to learn more!
Women Leaders at MongoDB: Raising the Bar with May Petry
March is Women’s History Month. Our women leaders series highlights MongoDB women who are leading teams and empowering others to own their career development and build together. May Petry, Vice President of Digital and Growth Marketing, discusses the importance of defining your values, being authentic, and “getting comfortable with being uncomfortable.” Tell me a bit about your team. The Digital and Growth Marketing team is focused on finding the next best customer for MongoDB, helping them be wildly successful on Atlas, and accelerating their future growth on our platform. Our growth goals include driving awareness in net new audiences, generating revenue through our self-serve channel, delivering new digital experiences, and growing sales opportunities. What characteristics make a good leader? Good leaders have a clear set of personal values that guide their decisions and define their leadership style. They find joy in not just what their team does but how. A good leader is a ‘bar raiser’ and demonstrates mastery of all the company values. I value authenticity, integrity, empathy, accomplishment, and advocacy in leaders. What has your experience been like as a woman growing your career in leadership? There have been many occasions where I am the only woman and person of color in the room. Early in my career, this was intimidating and lonely, but finding allies helped. I also remember being told to “use my voice.” I was. I just wasn’t being heard. Focusing on how to speak so others listen is a skill to develop. The stakes just get higher as you advance your career. Tell us about some of the biggest lessons you’ve learned throughout your career. I’ll share two. First, I don’t have to be the best at what my team does. I have to be the best in helping my team do what they do best and excel at arranging their outputs, so it’s amplified, highly efficient, and ridiculously impactful. The second is that imposter syndrome doesn’t ever go away. It gets worse - use it to fuel your curiosity and empathy, drive collaboration, and help others grow. What’s your advice for building and developing a team? As a leader developing a team, you need to be a role model. Be authentic and vulnerable. Don’t just talk about learning and development - do something about it. Does everyone in your organization have an individual growth plan? Do they know what raising the bar looks like? Do they have regular conversations with their managers for feedback and recognition? That said, everyone is responsible for their own personal and professional growth. Take charge of your destiny by looking for mentors, coaches, and allies. What’s one piece of advice you have for women looking to grow their careers as leaders? Get comfortable with being uncomfortable. Find a good circle of people to share, brainstorm, laugh, or cry with. We are our own worst critics, so be kind to yourself, stop apologizing, and go shine! Together, there’s nothing we can’t build. View current openings on our careers site.
Clear: Enabling Seamless Tax Management for Millions of People with MongoDB Atlas
Building India's largest tax and financial services software platform trusted by more than six million Indians With India’s large population and growing middle class, the country’s tax-paying population has been rising steadily. At the end of the financial year 2021-22, about 5.83 crore (58.3 million ) individuals filed tax returns with the Indian Income Tax department. In addition, India also has about 13.8 million registered Goods and Services Tax (GST) taxpayers. When juxtaposed with growing digitization in India, this opens up massive demand for a convenient and effective platform to manage tax returns. Clear realized this need early on and launched as a SaaS offering for ITR filing to individuals in 2011 that is currently trusted by more than six million Indians. It is second only to the Indian IT Department’s portal in terms of registered users. More recently, Clear has been focused on expanding its B2B portfolio, including launching an e-invoicing system. Today, the system supports about 50,000 tax professionals, one million small businesses, and 4000 enterprises in GST filing. How to ensure a seamless experience for all users at scale Clear built the initial version of its B2B e-invoicing system on MySQL. However, as adoption grew, the team started to see the limits of the systems tested. Certain batches of invoices were taking upwards of 25 minutes to process, an issue for the time-sensitive nature of tax filing. If any Clear customer failed to file in time, that customer could be given a penalty and labeled as non-compliant by the Indian government. The team knew they needed to take a step back and reevaluate the core structure of their system. The Clear team started the system rework by outlining a set of required capabilities. The new database system would need to be able to scale up quickly to handle periods of peak demand and down when traffic was low to save on costs. Tax professionals need to be able to see multiple cuts of the data at different levels, so the database would need to be able to support quick and complex aggregations. Lastly, the team knew that didn’t want to be accountable for the management of the system themselves. They needed a fully-managed option. MongoDB Atlas chosen for best in class scale and performance The company ran a proof of concept (POC) study comparing MySQL’s performance with other competitive offerings, including MongoDB. It found that, in terms of the time taken to execute different batch sizes of data, MongoDB was considerably faster in all instances. For example, MongoDB’s processing time was 122% faster than the closest competitor and 767% faster than the farthest competitor. Comparison of performance among databases Given the document-based nature of invoices, the results of the POC made sense. With MongoDB, the Clear team could store invoice data together instead of splitting it across tables. This minimized the number of costly joins required to obtain data, leading to faster reads. MongoDB also allowed the team to easily split reads and writes in use cases where the system experienced high volumes of reads and where reading slightly stale data was permissible. Clear’s aggregation needs were also easily met with MongoDB’s aggregation pipeline. The combination of aggregation support and MongoDB’s full-text search capabilities meant that the Clear team could easily build filterable and searchable dashboards on top of their invoice data. Lastly, the team also loved the easy-to-use nature of MongoDB Atlas, MongoDB’s fully-managed developer data platform. With Atlas, the team could easily scale up and down their clusters on a schedule to match fluctuations in user traffic. Achieving a 2900% jump in processing speed along with cost savings After Clear replatformed from MySQL to MongoDB Atlas on AWS, their customers were shocked by the improvement. Pranesh Vittal, Director Of Engineering, ClearTax India said, “We have achieved considerable optimization with MongoDB. Our customers are often surprised by the pace of execution. There is a significant improvement in the performance, with as much as a 2900% jump in processing speed in some instances.” Comparing the performance of the new MongoDB powered platform On top of increased speeds, the team is also saving money. “We’ve generated over 20 crore invoices to date running on a single sharded cluster with a 4TB disc,” said Pranesh Vittal. “The ability to store older data in cold storage [with Online Archive] helped us achieve this.” Atlas Triggers also help the team automatically scale down their clusters each night and scale them up in the morning. The triggers are fully-managed and schedule-based, so it’s as easy as setting them up and letting them run. This automatic right-sizing is saving the team upwards of $7000 each month ($700 per cluster for 10 clusters). After seeing such positive results, the team has since decided to replatform multiple other products onto MongoDB. “Here, MongoDB’s live support and consultation have proved very useful,” said Pranesh Vittal. Now, Clear manages 25+ clusters and over 10TB of data on MongoDB Atlas.
Build a ML-Powered Underwriting Engine in 20 Minutes with MongoDB and Databricks
The insurance industry is undergoing a significant shift from traditional to near-real-time data-driven models, driven by both strong consumer demand, and the urgent need for companies to process large amounts of data efficiently. Data from sources such as connected vehicles and wearables are utilized to calculate precise and personalized premium prices, while also creating new opportunities for innovative products and services. As insurance companies strive to provide personalized and real-time products, the move towards sophisticated and real-time data-driven underwriting models is inevitable. To process all of this information efficiently, software delivery teams will need to become experts at building and maintaining data processing pipelines. This blog will focus on how you can revolutionize the underwriting process within your organization, by demonstrating how easy it is to create a usage-based insurance model using MongoDB and Databricks. This blog is a companion to the solution demo in our Github repository . In the GitHub repo, you will find detailed step-by-step instructions on how to build the data upload and transformation pipeline leveraging MongoDB Atlas platform features, as well as how to generate, send, and process events to and from Databricks. Let’s get started. Part 1: the Use Case Data Model Part 2: the Data Pipeline Part 3: Automated Decision Support with Databricks Part 1: The use case data model Figure 1: Entity relationship diagram - Usage-based insurance example Imagine being able to offer your customers personalized usage-based premiums that take into account their driving habits and behavior. To do this, you'll need to gather data from connected vehicles, send it to a Machine Learning platform for analysis, and then use the results to create a personalized premium for your customers. You’ll also want to visualize the data to identify trends and gain insights. This unique, tailored approach will give your customers greater control over their insurance costs while helping you to provide more accurate and fair pricing. A basic example data model to support this use case would include customers, the trips they take, the policies they purchase, and the vehicles insured by those policies. This example builds out three MongoDB collections, as well two Materialized Views . The full Hackloade data model which defines all the MongoDB objects within this example can be found here . Part 2: The data pipeline Figure 2: The data pipeline - Usage-based insurance The data processing pipeline component of this example consists of sample data, a daily materialized view, and a monthly materialized view. A sample dataset of IoT vehicle telemetry data represents the motor vehicle trips taken by customers. It’s loaded into the collection named ‘customerTripRaw’ (1) . The dataset can be found here and can be loaded via MongoImport , or other methods. To create a materialized view, a scheduled Trigger executes a function that runs an Aggregation Pipeline. This then generates a daily summary of the raw IoT data, and lands that in a Materialized View collection named ‘customerTripDaily’ (2) . Similarly for a monthly materialized view, a scheduled Trigger executes a function that runs an Aggregation Pipeline that, on a monthly basis, summarizes the information in the ‘customerTripDaily’ collection, and lands that in a Materialized View collection named ‘customerTripMonthly’(3). For more info on these, and other MongoDB Platform Features: MongoDB Materialized Views Building Materialized View on TimeSeries Data MongoDB Scheduled Triggers Cron Expressions Part 3: Automated decisions with Databricks Figure 3: The data pipeline with Databricks - Usage-based insurance The decision-processing component of this example consists of a scheduled trigger and an Atlas Chart. The scheduled trigger collects the necessary data and posts the payload to a Databricks ML Flow API endpoint (the model was previously trained using the MongoDB Spark Connector on Databricks). It then waits for the model to respond with a calculated premium based on the miles driven by a given customer in a month. Then the scheduled trigger updates the ‘customerPolicy’ collection, to append a new monthly premium calculation as a new subdocument within the ‘monthlyPremium’ array. You can then visualize your newly calculated usage-based premiums with an Atlas Chart! In addition to the MongoDB Platform Features listed above, this section utilizes the following: MongoDB Atlas App Services MongoDB Functions MongoDB Charts Go hands on Automated digital underwriting is the future of insurance. In this blog, we introduced how you can build a sample usage-based insurance data model with MongoDB and Databricks. If you want to see how quickly you can build a usage-based insurance model, check out our GitHub repository and dive right in!
MongoDB Welcomes New Cohort of Community Champions and Enthusiasts
MongoDB is excited to announce our new Community Champions and Community Enthusiasts joining the Community Advocacy Program . This program is a global community of passionate and dedicated MongoDB advocates. Through it, members can grow their knowledge, profile, and leadership by engaging with the larger community and advocating for MongoDB technologies and our users. Community Champions and Community Enthusiasts keep everyone informed and excited about our latest developments and offerings. They're the connective tissue between MongoDB and the organizers, contributors, and creators who represent the backbone of our community. They share their knowledge and experiences with others through a variety of media and events. Community Advocacy Program members also uplevel their knowledge of MongoDB technologies and build personal skills in advocacy and community engagement by working closely with MongoDB staff Members gain a variety of experiences and relationships that grow their professional stature as MongoDB practitioners and enable them to form meaningful bonds with community leaders. Returning Community Champion Nuri Halperin shared his experiences with the program: “Being part of the MongoDB Community Advocacy Program is a true honor. I get to work closely with people who are as enthusiastic as me about MongoDB, sharing our experiences and perspectives. It’s the place where we seriously geek-out, learn, and imagine new possibilities. This is the kind of deep engagement you can’t really get anywhere else.” To learn more about the Community Advocacy Program, please visit the MongoDB Community Advocacy Program page .
Queenly Builds New Formalwear Shopping Experience With Full Text Search Indexing
Two years ago, we profiled Queenly , a promising startup that's ushering in big changes to the formalwear industry by making it more accessible for everyday people. The San Francisco-based company operates a marketplace and search engine for buying and selling formalwear such as wedding dresses, prom dresses, special occasion attire, and wedding guest dresses. Four years removed from its successful launch, Queenly is now rolling out new social commerce features that co-founders Trisha Bantigue and Kathy Zhou hope will give users a forum to discuss fashion tips, share recommendations, and develop a community of like-minded friends. Ready to wear Zhou, who is also CTO of Queenly, chose MongoDB because she'd previously used it as a student at the University of Pennsylvania doing hackathons. "It was super easy to set up when I was just that starry eyed, 19-year-old kid that, honestly, didn't know anything about databases," she says. That simplicity remains a selling point for Zhou. "It's been really great to train our engineering team on MongoDB," Zhou says. "Even if they're a client-side engineer and don't have a background in databases." That ease of use will continue to pay off as the company scales and grows its technical team. Zhou's domain knowledge from working on search engines and recommendation systems at Pinterest led her to apply the advancements in algorithms and technology to the fashion industry. Full text search is a critical feature for building a truly personalized shopping experience that's tailored to the different life events that require formal wear. MongoDB Atlas Search is a fully integrated solution that makes it easy to add full text search with advanced functionality — fuzzy search, synonyms— to existing datastores. The simplicity of the out-of-the-box solution is huge for startups, Zhou says, because they're constantly growing and trying to structure their data along the way. "We have our own blended algorithms for ranking and delivering the most relevant search results to users, so plugging Atlas Search into our system helped fill in the user experience gaps when needed," Zhou says. "MongoDB was the right choice at the right time," she says. "When it comes to being able to do more complex querying and searching, MongoDB felt pretty easy." She also likes using NoSQL schemas and NoSQL databases because of the flexibility. Startups see so many different curveballs, she says, and so many different things they want to test and try, and having the flexibility to do that has really helped, according to Zhou. Data-driven differentiation Both Zhou and CEO Bantigue have experience in the fashion world and use that experience to customize their service to their audience. As we mentioned in our earlier profile, both grew up in low-income, immigrant households and entered beauty pageants as a way to earn tuition money. So they know the experience of needing to find the dress of your dreams but with limited resources. It's that lived experience that enables them to create a great UI/UX that treats customers the way they want to be treated. The co-founders, both 2022 Forbes 30 Under 30 honorees, combined their knowledge of the fashion industry with the ability to solve problems through data-driven methods to create differentiation in a crowded space. The search and indexing capabilities in MongoDB Atlas enable the Queenly application to curate a highly personalized visitor experience based on what you search for and spend time looking at. Normally, building new shopping categories or recommendation features would entail building a new data pipeline or data science infrastructure. Zhou says the compound filtering and indexing capabilities in MongoDB enable them to get new categories off the ground quickly and iterate as needed. “Communities on Queenly" has recently launched out of beta to all users, allowing them to ask each other questions like, "What kind of hairstyle should I wear for my wedding?" or "What kind of brands do you guys typically like?" Other interactive, social commerce type features that Queenly’s engineering team was able to quickly launch through the help of MongoDB’s indexing features include a Tiktok-style video feed and following feeds for user closets and brands. Support for startups Queenly is part of the MongoDB for Startups program , which helps startups build faster and scale further with free MongoDB Atlas credits, one-on-one technical advice, co-marketing opportunities, and access to a vast partner network. Zhou says the program has given them access to a level of specialized support that they wouldn't have had otherwise. "Clients our size might not get as much help as a really big company. I think it's really great that MongoDB for Startups exists so that us founders and small business owners can feel heard when it comes to just getting support," Zhou says. If you want to learn more about Queenly, check out queenly.com . To apply to become part of a growing team, visit queenly.com/jobs. Are you part of a startup and interested in joining the MongoDB for Startups program? Apply now .
MongoDB Atlas Integrations for CDKTF are now Generally Available
MongoDB Atlas Integrations for AWS CloudFormation and CDK are now Generally Available
MongoDB Atlas as the Data Hub for Smart Manufacturing with Microsoft Azure
All the source code used in this project, along with a detailed deployment guide, is available on our public Github page . Manufacturing companies are emerging from the pandemic with a renewed focus on digital transformation and smart factories investment. COVID-19 has heightened the need for Industrial IoT technology and innovation as consumers have moved towards online channels, forcing manufacturers to compete in a digitalized business environment. The manufacturing ecosystem can be viewed as a multi-dimensional grouping of systems designed to support the various business units in manufacturing organizations such as operations, engineering, maintenance, and learning & development functions. Process and equipment data is generated on the shop floor from machines and systems such as SCADA and then stored in a process historian or an operational database. The data originating from shop floor devices are generally structured time series data acquired through regular polling and sampling. Historians provide fast insertion rates of time series data, with capacities that reach up to tens of thousands of PLC tags processed per second. They rely on efficient data compression engines which can either be lossy or lossless. Traditional RDBMS storage comes packaged with the manufacturing software applications such as a Manufacturing Execution System (MES). Relational databases are traditionally common in manufacturing systems and thus the choice of database systems for these manufacturing applications are typically driven by historical preferences. Manufacturing companies have long relied on using several databases and data warehouses to accommodate various transactional and analytical workloads. The strategy of separating operational and analytical systems has worked well so far and has caused least interference with the operational process. However this strategy will not fare well in the near future for two reasons: Manufacturers are generating high volume, variety and veracity data using advanced IIoT platforms to create a more connected product ecosystem. The growth of IIoT data has been rapid and in fact, McKinsey and Company estimates that companies will spend over $175B in IIoT and edge computing hardware by 2025. A traditional manufacturing systems setup necessitates the deployment and maintenance of several technologies including graph databases (for asset digital models and relationships) and time series databases (for time series sensor data) and leads to IT sprawl across the organization. A complex infrastructure causes latency and delays in data access which leads to non-realization of real time insights for improving manufacturing operations. To establish an infrastructure that can enable real time analytics, companies need real time access to data and information to make the right decision in time. Analytics can no longer be a separate process, it needs to be brought into the application. The applications have to be supplied with notifications and alerts instantly. This is where application-driven analytics platforms such as MongoDB Atlas come into picture. We understand that to build smarter and faster applications, we can no longer rely on maintaining separate systems for different transactional and analytical workloads. Moving data between disparate systems takes time and energy and results in longer time to market and slower speed of innovation. Many of our customers start out using MongoDB as an operational database for both new cloud-native services as well as modernized legacy apps. More and more of these clients are now improving customer experience and speeding business insight by adopting application-driven analytics within the MongoDB Atlas platform. They use MongoDB to support use cases in real-time analytics, customer 360, internet of Things (IoT) and mobile applications across all industry sectors. As mentioned before, Manufacturing ecosystem employs a lot of databases just to run production operations. Once IIoT solutions are added to the mix, each solution (shown in yellow in Figure 1) may come with its own database (Time Series, relational, graph etc.) and the number of databases will increase dramatically. With MongoDB Atlas, this IT sprawl can be reduced as multiple use cases can be enabled using MongoDB Atlas (Figure 2). The versatility of the document model to structure data any way the application needs, coupled with an expressive API and indexing that allows you to query data any way you want is a powerful value proposition. The benefits of MongoDB Atlas are amplified by the platform’s versatility to address almost any workload. Atlas combines transactional processing, application-driven analytics, relevance-based search, and mobile edge computing with cloud sync. These capabilities can be applied to almost every type of modern applications being built for the digital economy by developers. Figure 1: IT sprawl with IIoT and analytics solutions deployment in Manufacturing Figure 2: MongoDB Atlas simplifying road to Smart Manufacturing MongoDB and Hyperscalers leading the way for smart manufacturing Manufacturers who are actively investing in digital transformation and IIoT are experiencing an exponential growth in data. All this data offers opportunities for new business models and digital customer experiences. To drive the right outcomes from all this data, manufacturers are setting up scalable infrastructures using Hyperscalers such as Azure, AWS and GCP. These hyperscalers offer a suite of components for efficient, scalable implementation of IIoT platforms. Companies are leveraging these accelerators to quickly build solutions, which help access, organize, and analyze previously untapped data from sensors, devices, and applications. In this article, we are focused on how MongoDB integrates with Microsoft Azure IoT modules and acts as the digital data hub for smart manufacturing use cases. MongoDB and Microsoft have been partners since 2019, but last year it was expanded, enabling developers to build data intensive applications within the Azure marketplace and Azure portal. This enables an enhanced developer experience and allows burn down of their Microsoft Azure Consumption Commitment. The alliance got further boost when Microsoft included MongoDB as a partner in its newly launched Microsoft Intelligent Data Platform Ecosystem . MongoDB Atlas can be deployed in 35 regions in Azure and has seamless integration with most of the Azure Developer services (Azure functions, App services, ADS), Analytics services (Azure Synapse), Data Governance (Microsoft Purview), ETL (ADF) and cross cutting services (AD, KMS, AKS etc.) powering building of innovative solutions. Example scenario: Equipment failure prediction Imagine a manufacturing facility that has sensors installed in their Computer Numerical Control (CNC) machines measuring parameters such as temperature, torque, rotational speed and tool wear. A sensor gateway converts analog sensor data to digital values and pushes it to Azure IoT Edge which acts as a gateway between factory and the Cloud. This data is transmitted to Azure IoT Hub where the IoT Edge is registered as an end device. Once we have the data in the IoT Hub, Azure Stream Analytics can be utilized to filter the data so that only relevant information flows into the MongoDB Atlas Cluster. The connection between Stream Analytics and MongoDB is done via an Azure Function. This filtered sensor data inside MongoDB is used for following purposes: To provide data for machine learning model that will predict the root cause of machine failure based on sensor data. To act as a data store for prediction results that can be utilized by business intelligence tools such as PowerBI using Atlas SQL Interface. To store the trained machine learning model checkpoint in binary encoded format inside a collection. The overall architecture is shown in Figure 3. Figure 3: Overall architecture Workflow: The sensors in the factory are sending time series measurements to Azure IoT Hub. These sensors are measuring for multiple machines: Product Type Air Temperature (°C) Process Temperature (°C) Rotational Speed Torque Tool Wear (min) IoT Hub will feed these sensor data to Azure Stream Analytics, where the data will be filtered and pushed to MongoDB Atlas time series collections. The functionality of Stream Analytics can be extended by implementing machine learning models to do real-time predictive analytics on streaming input data. The prediction results can also be stored in MongoDB in a separate collection. The sensor data contains the device_id field which helps us filter data coming from different machines. As MongoDB is a document database, we do not need to create multiple tables to store this data, in fact we can just use one collection for all the sensor data coming from various devices or machines. Once the data is received in MongoDB, sum and mean values of sensor data will be calculated for the predefined production shift duration and the results will be pushed to MongoDB Atlas Charts for visualization. MongoDB Time series window functions are used in an aggregation pipeline to produce the desired result. When a machine stoppage or breakdown occurs during the course of production, it may lead to downtime because the operator has to find out the cause of the failure before the machine can be put back into production. The sensor data collected from the machines can be used to train a machine learning model that can automatically predict the root cause when a failure occurs and significantly reduce the time spent on manual root cause finding on the shop floor. This can lead to increased availability of machines and thus more production time per shift. To achieve this goal, our first task is to identify the types of failures we want to predict. We can work with the machine owners and operators to identify the most common failure types and note that down. With this important step completed, we can identify the data sources that have relevant data about that failure type. If need be, we can update the Stream Analytics filter as well. Once the right data is identified, we train a Decision Tree Classifier model in Azure Machine Learning and deploy it as a binary value as a separate collection inside MongoDB. Atlas Scheduled Triggers are used to trigger the model (via an Azure Function) and the failure prediction results are written back results into a separate Failures collection in MongoDB. Scheduled triggers’ schedule can be aligned to production schedule so that it only fires when a changeover occurs for example. After a failure is detected, the operator and supervisor needs to be notified immediately. Using App Services, a mobile application is developed to send notifications and alerts to floor supervisor and machine operator once a failure root cause is predicted. Figure 4 shows the mobile app user interface where the user has an option to acknowledge the alert. Thanks to Atlas Device Sync , even when the mobile device is facing unreliable connectivity, the data keeps in sync between Atlas cluster and Realm database in the app. MongoDB’s Realm , is an embedded database technology already used on millions of mobile phones as part of mobile apps as well as infotainment like systems. FIgure 4: Alert app user interface Business benefits of using MongoDB Atlas as smart manufacturing data hub Scalability: MongoDB is a highly scalable document based database that can handle large amounts of structured, semi-structured and unstructured data. Native time series collections are available that help with storing large amounts of data generated by IIoT enabled equipment in a highly compressed manner. Flexibility: MongoDB stores data in a flexible, JSON-like format, which makes it easy to store and query data in a variety of ways. This flexibility makes it well-suited for handling the different data structures needed to store sensor data, ML models and prediction results, all in one database. This removes the need for maintaining separate databases for each type of data reducing IT sprawl in manufacturing organizations. Real-time Analytics: As sensor data comes in, MongoDB aggregation pipelines can help in generating features to be used for machine learning models. Atlas Charts can be set up in minutes to visualize important features and their trends in near real time. BI Analytics: Analysts can use the Atlas SQL interface to access MongoDB data from SQL based tools. This allows them to work with rich, multi-structured documents without defining a schema or flattening data. In a connected factory setting, this can be useful to generate reports for failures over a period of time and comparison between different equipment failures types. Data can be blended from MongoDB along with other sources of data to provide a 360 degree view of production operations. Faster Mobile Application Development: Atlas device sync bidirectionally connects and synchronizes Realm databases inside mobile applications with the MongoDB Atlas backend, leading to faster mobile application development and less time needed for maintenance of deployed applications. Conclusion The MongoDB Atlas developer data platform is designed and engineered to help speed up your journey towards smart manufacturing. It is not just suitable for high speed time series workloads but also for workloads that power mobile applications and BI Dashboards – leading to smarter applications, increased productivity and eventually smarter factories. Learn more All the source code used in this project, along with a detailed deployment guide, is available on our public Github page . To learn more about how MongoDB enables IIoT for our customers, please visit our IIoT use cases page . Get started today with MongoDB Atlas on Azure Marketplace listing .