Hot Off the Press: MongoDB Launches Two New Books at MongoDB.local London
It’s well known that developers today are facing immense demand to build new, modern applications at an accelerated pace. In fact, recent data from IDC predicts that by 2025 there will be a shortfall of 4 million developers . As the industry moves towards more complex and advanced technologies, developers remain the foundation – as well as the key – to expanding emerging, innovative technologies, from AI to IoT and other automation applications. The skills required to develop these technologies are becoming increasingly specialized. In turn, the need for accessible resources, training, and education is only becoming more pressing. That’s why we’re delighted to launch MongoDB Press – our very own official series of educational books – penned by a mix of our in-house experts and trusted industry voices, covering both technical and strategic topics. We’re thrilled that our first two books of the series – Mastering MongoDB 7.0 and Practical MongoDB Aggregations – will be launched at MongoDB .local London 2023 . Practical MongoDB Aggregations was written by our very own Executive Solutions Architect, Paul Done, and is intended for developers with a baseline understanding of the MongoDB Aggregation Framework. The book will allow readers to learn more about building aggregation pipelines. You can get 20% off now by visiting mongodb.com/books . Attendees at MongoDB .local London will be able to get signed copies of the book while supplies last Mastering MongoDB 7.0 is available for pre-order and was written by a team of MongoDB experts. It provides a deep-dive into the latest features of MongoDB. By the end of the book, readers will have gained the practical understanding required to design, develop, administer, and scale MongoDB-based database applications, both on-premises and on the cloud.
Being Latine in Tech: Two MongoDB Employees Share Their Advice on Building Careers in Engineering
Ashley Naranjo and Martin Bajana, members of MongoDB’s employee resource group QueLatine, share their career journeys and offer insight into how other members of the Latine community can build careers in tech. Jackie Denner: How did you make your way into the tech industry? Ashley Naranjo: I am a first-generation Latina with a passion for Information Technology and a knack for problem-solving. After graduating early from high school, I embarked on a career in Nursing. I chose Nursing initially because I wanted to make a difference and help others, but my path took an unexpected turn when COVID-19 reshaped our world. In light of the circumstances, I reevaluated my options and decided to seize an opportunity with a program called Year Up . During the intensive six-month training and deployment phase, I not only completed rigorous coursework but also obtained IT Google Coursera certifications and actively pursued CompTIA certifications. This experience allowed me to secure an internship at Meta (Facebook) as an Enterprise Operation IT Support Tech, where my love for technology blossomed. During my time at Meta, I had the privilege of assisting diverse Meta users worldwide with a wide range of technical issues, including troubleshooting, software and hardware support, internal access permissions, and more. The exposure to a global tech environment further fueled my passion for the field. When my internship concluded, I was offered a 1-year contract role with Meta to continue my work as a support tech for the same team. Throughout that year, I immersed myself in all aspects of technology, maximizing my learning opportunities and applying my networking skills. As time went on, I knew I needed a new challenge. This led me to embark on a search for an exciting role, which eventually brought me to MongoDB. I am passionate about driving technological innovation, and MongoDB is a place where I can make an impact. Martin Bajana: My interest in technology stems from a variety of sources. From a young age, I developed a strong passion for video games and exploring new technologies. Whether it was experimenting with the latest gaming consoles or delving into computer hardware, I relished the opportunity to learn and understand the inner workings of these technologies. In school, I discovered my affinity for mathematics, which further solidified my decision to pursue a career in the tech industry. Choosing to study computer science in college was a natural progression for me, as it allowed me to combine my love for technology with my aptitude for problem-solving. After completing my education, I was recruited by Verizon, where I worked on front-end applications and Android development. Although the transition was initially challenging, I persevered and regained my confidence. It was during this period that I realized a career in technology was my long-term aspiration. Throughout my tenure at Verizon, I embraced opportunities to work across various teams, acquiring valuable experience and honing my skills. Eventually, I made the decision to join MongoDB, which has provided me with an enriching journey and the chance to shape my career in the tech industry. JD: Have there been any challenges you've faced throughout your career? AN: Imposter syndrome has been a significant challenge for me throughout my career, and it's something I still deal with to this day. When surrounded by my talented colleagues, I would often compare myself to them and focus on my perceived weaknesses and flaws, leading to a lack of self-confidence. However, I tackled this issue by addressing my feelings with my manager. Her support and guidance helped me realize my own potential and acknowledge my accomplishments. Maintaining a positive mindset has enabled me to view myself as a competent engineer and recognize the value I bring to my team. I have learned to take ownership of my successes and embrace opportunities for growth. Stepping out of my comfort zone has become a regular practice, as personal and professional development often stems from embracing challenges and discomfort. By giving myself permission to take up space and be confident in my abilities, I have been able to overcome imposter syndrome and continue to thrive in my role. MB: I have been fortunate enough to work for companies and teams that value and respect me for the work I deliver. Being in the tech industry and growing up in a culturally diverse region of the country, I have had exposure to individuals from various backgrounds and identities, which has made me more comfortable as a Latinx individual in the industry. My personal goal is to promote a work environment where everyone is judged based on the contributions they bring to the team, rather than their identity. I believe in supporting and respecting the identities of my peers and coworkers while fostering a culture of inclusivity and equality. JD: How has MongoDB supported your career growth and development? AN: In my time working at MongoDB, I have experienced exceptional support that has greatly contributed to my professional development and growth. As an engineer at MongoDB, I have been provided with numerous opportunities to expand my knowledge and skills through participation in tech talks, hackathons, and continuous learning about emerging technologies. I am grateful for the proactive approach taken by my manager and team leaders in fostering my growth as an engineer. Additionally, MongoDB's commitment to diversity and inclusion is evident through the company's DEI initiatives. Platforms like our employee resource group “QueLatine” have made me feel a stronger sense of connection and belonging, particularly among my Latinx peers. By recognizing the power of our diverse backgrounds and experiences, MongoDB empowers us to have a meaningful impact in the industry. MB: I have experienced full support from my leader since day one. They have proactively sought to understand my career goals and have helped me create a clear career path to achieve those goals. This level of support has enabled me to take on challenging projects and initiatives within the company, allowing me to grow and develop in my career. Furthermore, MongoDB offers a wealth of learning and development resources to its employees, which I have fully utilized to continue learning and growing my skill set. JD: What is your advice for other Latines who want to begin careers in tech? AN: Having made a significant career change myself, I can empathize with the challenges that come with exploring new paths, particularly in the tech industry. As a Latina in tech, I feel a strong desire to encourage and raise awareness within our community about the incredible resources and opportunities that are available to us. My advice to others who may be considering a similar journey is to prioritize the continuous development of your technical skills, actively seek out mentoring opportunities, push yourself beyond your comfort zone by honing your networking abilities, and most importantly, believe in yourself and your ability to achieve great things! MB: Navigating the vast world of technology can certainly be overwhelming, but it's important not to fear feeling lost. Even after 12 years in this career, there are still days where I come across something I've never heard of before. Fortunately, we live in a world abundant with resources for continuous learning. My advice is to take the time to explore and ask questions. Seek out open-source projects that you can contribute to, and connect with other professionals in the tech industry who can share their experiences and provide guidance. Additionally, taking advantage of hackathons and other tech events can expose you to new technologies and ideas. Don't be afraid to make mistakes, and most importantly, don't give up! Join us in transforming the way developers work with data. Build your tech career at MongoDB .
Fusing MongoDB and Databricks to Deliver AI-Augmented Search
With customers' attention more and more dispersed across channels, platforms, and devices, the retail industry rages with the relentless competition. The customer’s search experience on your storefront is the cornerstone of capitalizing on your Zero Moment of Truth, the point in the buying cycle where the consumer's impression of a brand or product is formed. Imagine a customer, Sarah, eager to buy a new pair of hiking boots. Instead of wandering aimlessly through pages and pages of search results, she expects to find her ideal pair easily. The smoother her search, the more likely she is to buy. Yet, achieving this seamless experience isn't a walk in the park for retailers. Enter the dynamic duo of MongoDB and Databricks. By equipping their teams with this powerful tech stack, retailers can harness the might of real-time in-app analytics. This not only streamlines the search process but also infuses AI and advanced search functionalities into e-commerce applications. The result? An app that not only meets Sarah's current expectations but anticipates her future needs. In this blog, we’ll help you navigate through what are the main reasons to implement an AI-augmented search solution by integrating both platforms. Let’s embark on this! A solid foundation for your data model For an e-commerce site built around the principles of an Event Driven and MACH Architecture , the data layer will need to ingest and transform data from a number of different sources. Heterogeneous data, such as product catalog, user behavior on the e-commerce front-end, comments and ratings, search keywords, and customer lifecycle segmentation- all of this is necessary to personalize search results in real time. This increases the need for a flexible model such as in MongoDB’s documents and a platform that can easily take in data from a number of different sources- from API, CSV, and Kafka topics through the MongoDB Kafka Connector . MongoDB's Translytical capabilities, combining transactional (OLTP) and analytical (OLAP) offer real-time data processing and analysis, enabling you to simplify your workloads while ensuring timely responsiveness and cost-effectiveness. Now the data platform is servicing the operational needs of the application- what about adding in AI? Combining MongoDB with Databricks, using the MongoDB Spark Connector can allow you to train your models with your operational data from MongoDB easily and to trigger them to run in real-time to augment your application as the customer is using it. Centralization of heterogeneous data in a robust yet flexible Operational Data Layer The foundation of an effective e-commerce data layer lies in having a solid yet flexible operational data platform, so the orchestrating of ML models to run at specific timeframes or responding to different events, enabling crucial data transformation, metadata enrichment, and data featurization becomes a simple, automated task for optimizing search result pages and deliver a frictionless purchasing process. Check out this blog for a tutorial on achieving near real-time ingestion using the Kafka Connector with MongoDB Atlas, and data processing with Databricks Spark User Defined Functions. Adding relevance to your search engine results pages To achieve optimal product positioning on the Search Engine Results Page (SERP) after a user performs a query, retailers are challenged with creating a business score for their products' relevance. This score incorporates various factors such as stock levels, competitor prices, and price elasticity of demand. These business scores are complex real-time analyses calibrated against so many factors- it’s a perfect use case for AI. Adding AI-generated relevance to your SERPs can accurately predict and display search results that are most relevant to users' queries, leading to higher engagement and increased click-through rates, while also helping businesses optimize their content based on the operational context of their markets. The ingestion into the MongoDB Atlas document-based model laid the groundwork for this challenge, and leveraging the MongoDB Apache Spark Streaming Connector companies can persist their data into Databricks, taking advantage of its capabilities for data cleansing and complex data transformations, making it the ideal framework for delivering batch training and inference models. Diagram of the full architecture integrating MongoDB Atlas and Databricks for an e-commerce store, real-time analytics, and search MongoDB App Services act as the mortar of our solution, achieving an overlap of the intelligence layer in an event-driven way, making it not only real-time but also cost-effective and rendering both your applications and business processes nimble. Make sure to check out this GitHub repository to understand in depth how this is achieved. Data freshness Once that business score can be calculated comes the challenge of delivering it over the search feature of your application. With MongoDB Atlas native workload isolation, operational data is continuously available on dedicated analytics nodes deployed in the same distributed cluster, and exposed to analysts within milliseconds of being stored in the database. But data freshness is not only important for your analytics use cases, combining both your operational data with your analytics layer, retailers power in-app analytics and build amazing user experiences across your customer touch points. Considering MongoDB Atlas Search 's advanced features such as faceted search, auto-complete, and spell correction, retailers rest assured of a more intuitive and user-friendly search experience not only for their customers but for their developers, as it minimizes the tax of operational complexity as all these functionalities are bundled in the same platform. App-driven analytics is a competitive advantage against traditional warehouse analytics Additionally, the search functionality is optimized for performance, enabling businesses to handle high search query volumes without compromising user experience. The business score generated from the AI models trained and deployed with Databricks will provide the central point to act as a discriminator over where in the SERPs any of the specific products appear, rendering your search engine relevance fueled and securing the delivery of a high-quality user experience. Conclusion Search is a key part of the buying process for any customer. Showing customers exactly what they are looking for without investing too much time in the browsing stage reduces friction in the buying process, but as we’ve seen it might not be so easy technically. Empower your teams with the right tech stack to take advantage of the power of real-time in-app analytics with MongoDB and Databricks. It’s the simplest way to build AI and search capabilities into your e-commerce app, to respond to current and future market expectations. Check out the video below and this GitHub repository for all the code needed to integrate MongoDB and Databricks and deliver a real-time machine-learning solution for AI-augmented Search.
Why Queryable Encryption Matters to Developers and IT Decision Makers
Enterprises face new challenges in protecting data as modern applications constantly change requirements. There are new technologies, advances in cryptography, regulatory constraints, and architectural complexities. The threat landscape and attack techniques are also changing, making it harder for developers to be experts in data protection. Client-side field level encryption , sometimes referred to as end-to-end encryption, provides another layer of security that enables enterprises to protect sensitive data. Although client-side encryption fulfills many modern requirements, architects, and developers face challenges in implementing these solutions to protect their data efficiently for several reasons: Multiple cryptographic tools to choose from — Identifying the relevant libraries, selecting the appropriate encryption algorithms, configuring the selected algorithms, and correctly setting up the API for interaction are some of the challenges around tools. Encryption key management challenges — how and where to store the encryption keys, how to manage access, and how to manage key lifecycle such as rotation and revocation. Customize application(s) — Developers might have to write custom code to encrypt, decrypt, and query the data requiring widespread application changes. With Queryable Encryption now generally available, MongoDB helps customers protect data throughout its data lifecycle — data is encrypted at the client side and remains encrypted in transit, at rest, and in use while in memory, in logs, and backups. Also, MongoDB is the only database provider that allows customers to run rich queries on encrypted data, just like they can on unencrypted data. This is a huge advantage for customers as they can query and secure the data confidently. Why does Queryable Encryption matter to IT decision-makers and developers? Here are a few reasons: Security teams within enterprises deal with protecting their customers’ sensitive data — financial records, personal data, medical records, and transaction data. Queryable Encryption provides a high level of security — by encrypting sensitive fields from the client side, the data remains encrypted while in transit, at rest, and in use and is only ever decrypted back at the client. With Queryable Encryption, customers can run expressive queries on encrypted data using an industry-first fast, encrypted search algorithm. This allows the server to process and retrieve matching documents without the server understanding the data or why the document should be returned. Queryable Encryption was designed by the pioneers of encrypted search with decades of research and experience in cryptography and uses NIST-standard cryptographic primitives such as AES-256, SHA2, and HMACs. Queryable Encryption allows a faster and easier development cycle — developers can easily encrypt sensitive data without making changes to their application code by using language-specific drivers provided by MongoDB. There is no crypto experience required and it’s intuitive and easy for developers to set up and use. Developers need not be cryptography experts to encrypt, format, and transmit the data. They don't have to figure out how to use the right algorithms or encryption options to implement a secure encryption solution. MongoDB has built a comprehensive encryption solution including key management. Queryable Encryption helps enterprises meet strict data privacy requirements such as HIPAA, GDPR, CCPA, PCI, and more using strong data protection techniques. It offers customer-managed and controlled keys. The MongoDB driver handles all cryptographic operations and communication with the customer-provisioned key provider . Queryable Encryption supports AWS KMS, Google Cloud KMS, Azure Key Vault, and KMIP-compliant key providers. MongoDB also provides APIs for key rotation and key migration that customers can leverage to make key management seamless. ** Equality query type is supported in 7.0 GA *With automation encryption enabled For more information on Queryable Encryption, refer to the following resources: Queryable Encryption documentation Queryable Encryption FAQ Download drivers Queryable Encryption Datasheet
View and Analyze Your Monthly MongoDB Atlas Usage with Cost Explorer
In today's macroeconomic climate, knowing where your money's going is a big deal. From optimizing costs to boosting efficiency, understanding your software expenses can be a total game-changer for your business. That’s why we’re excited to announce the release of Cost Explorer in MongoDB Atlas. Cost Explorer is a new visual interface available in the Billing section of the Atlas UI that is meant to help you view and analyze your monthly MongoDB Atlas usage in one convenient location. How can Cost Explorer help you? Cost Explorer allows you to easily filter your Atlas usage data by what’s most important to you and your business, with filters to segment your view by organization (if you have cross-org billing enabled), projects, clusters, or services, within a time window of up to 18 months. With Cost Explorer, you can now quickly pinpoint trends or outliers in your month-over-month usage to identify opportunities to potentially improve or optimize your Atlas usage going forward. If you’re looking for additional customization beyond what is available in Cost Explorer, you can also create your own billing dashboards in Atlas Charts that are fully tailored to your needs. Cost Explorer is viewable for any Atlas user assigned the Organization Owner, Billing Admin, or Organization Billing Viewer roles. To learn more about Cost Explore and how to manage your Atlas billing, view our documentation on managing billing .
How MongoDB and Alibaba Cloud are Powering the Era of Autonomous Driving
The emergence of autonomous driving technologies is transforming how automotive manufacturers operate, with data taking center stage in this transformation. Manufacturers are now not only creators of physical products but also stewards of vast amounts of product and customer data. As vehicles transform into connected vehicles, automotive manufacturers are compelled to transform their business models into software-first organizations. The data generated by connected vehicles is used to create better driver assistance systems and paves the way for autonomous driving applications. It has to be noted that the journey toward autonomous vehicles is not just about building reliable vehicles but harnessing the power of connected vehicle data to create a new era of mobility that seamlessly integrates cutting-edge software with vehicle hardware. The ultimate goal of autonomous vehicle makers is to produce cars that are safer than human-driven vehicles. Since 2010, investors have poured over 200 billion dollars into autonomous vehicle technology. Even with this large amount of investment, it is very challenging to create fully autonomous vehicles that can drive safer than humans. Some experts estimate that the technology to achieve level 5 autonomy is about 80% developed but the last 20% will be extremely hard to achieve and will take a lot of time to perfect. Unusual events such as extreme weather, wildlife crossings, and highway construction are still enigmas for many automotive companies to solve. The answer to these challenges is not straightforward. AI-based image and object recognition still has a long way to go to deal with uncertainties on the road. However, one thing is certain, automotive manufacturers need to make use of data captured by radar, LiDAR, camera systems, and the whole telemetry system in the vehicle in order to train their AI models better. A modern vehicle is a data powerhouse. It constantly gathers and processes information from onboard sensors and cameras. The Big Data generated as a result presents a formidable challenge, requiring robust storage and analysis capabilities. Additionally, this time series data needs to be analyzed in real-time and decisions have to be made instantaneously in order to guarantee safe navigation. Furthermore, ensuring data privacy and security is also a hurdle to cross since self-driving vehicles need to be shielded from cyber attacks as such an attack can cause life-threatening events. The development of high-definition (HD) maps to help the vehicle ‘see’ what is on the road also poses technical challenges. Such maps are developed using a combination of different data sources such as Global Navigation Satellite Systems (GNSS), radar, IMUs, cameras, and LiDAR. Any error in any one of these systems aggregates and ultimately impacts the accuracy of the navigation. It is required to have a data platform in the middle of the data source (vehicle systems) and the AI platform to accommodate and consolidate this diverse information while keeping this data secure. The data platform should be able to preprocess this data as well as add additional context to it before using it to train or run the AI modules such as object detection, semantic segmentation, and path planning. MongoDB can play a significant role in addressing above mentioned data-related challenges posed by autonomous driving. The document model is an excellent way to accommodate diverse data types such as sensor readings, telematics, maps, and model results. New fields to the documents can be added at run time, enabling the developers to easily add context to the raw telemetry data. MongoDB’s ability to handle large volumes of unstructured data makes it suitable for the constant influx of vehicle-generated information. MongoDB is not only an excellent choice for data storage but also provides comprehensive data pre-processing capabilities through its aggregation framework. Its support for time series window functions allows data scientists to produce calculations over a sorted set of documents. Time series collections also dramatically reduce storage costs. Column compression significantly improves practical compression, reduces the data's overall storage on disk, and improves read performance. MongoDB offers robust security features such as role-based access control, encryption at rest and in transit, comprehensive auditing, field-level redaction and encryption, and down to the level of client-side field-level encryption that can help shield sensitive data from potential cyber threats while ensuring compliance with data protection regulations. For challenges related to effectively storing and querying HD maps, MongoDB’s geospatial features can aid in querying location-based data and also combining the information from maps with telemetry data fulfilling the continuous updates and accuracy requirements for mapping. Furthermore, MongoDB's horizontal scaling or sharding allows for the seamless expansion of storage and processing capabilities as the volume of data grows. This scalability is essential for handling the data streams generated by fleets of self-driving vehicles. During the research and development of autonomous driving projects, scalable infrastructure is required to quickly and steadily collect and process massive data. In such projects, data is generated at the terabyte level every day. To meet these needs, Alibaba Cloud provides a solution that integrates data collection, transmission, storage, and computing. In this solution, the data collected daily by sensors can be simulated and collected using Alibaba Cloud Lightning Cube and sent to the Object Storage Service (OSS) . Context is added to this data using a translator and then this contextualized information can be pushed to MongoDB to train models. MongoDB and Alibaba Cloud recently announced a four-year extension to their strategic global partnership that has seen significant growth since being announced in 2019. Through this partnership, automotive manufacturers can easily set up and use MongoDB-as-a-service-AsparaDB for MongoDB from Alibaba Cloud’s data centers globally. Figure 1: Data collection and model training data link with MongoDB on Alibaba Cloud. When the vehicle is on the road, the telemetry data is captured through an MQTT gateway, converted into Kafka, and then pushed into MongoDB for data storage and archiving. This data can be used for various applications such as real-time status updates for engine and battery, accident analysis, and regulatory reporting. Figure 2: Mass Production vehicles data link with MongoDB on Alibaba Cloud For a company that is looking to build autonomous driving assistance systems, Alibaba Cloud and ApsaraDB for MongoDB is an excellent technology partner to have. ApsaraDB for MongoDB can handle TBs of diverse sensor data from cars on a daily basis, which doesn't conform to a fixed format. MongoDB provides reliable and highly available data storage for this heterogenous data enabling companies to rapidly expand their system within minutes resulting in time savings when processing and integrating autonomous driving data. By leveraging Alibaba Cloud's ApsaraDB for MongoDB, the R&D team can focus on innovation rather than worrying about data storage and scalability, contributing to faster innovation in the field of autonomous driving. In summary, MongoDB's flexibility, versatility, scalability, real-time capabilities, and strong security framework make it well-suited to address the multifaceted data requirements and challenges that autonomous driving presents. By efficiently managing and analyzing the Big Data generated, MongoDB and Alibaba Cloud are paving the path toward reliable and safe self-driving technology. To learn more about MongoDB’s role in the automotive industry, please visit our manufacturing and automotive webpage .
Building AI with MongoDB: Unlocking Value from Multimodal Data
One of the most powerful capabilities of AI is its ability to learn, interpret, and create from input data of any shape and modality. This could be structured records stored in a database to unstructured text, computer code, video, images, and audio streams. Vector embeddings are one of the key AI enablers in this space. Encoding our data as vector embeddings dramatically expands the ability to work with this multimodal data. We’ve gone from depending on data scientists training highly specialized models just a few years ago to developers today building general-purpose apps incorporating NLP and computer vision. The beauty of vector embeddings is that data that is unstructured and therefore completely opaque to a computer can now have its meaning and structure inferred and represented via these embeddings. Using a vector store such as Atlas Vector Search means we can search and compute unstructured and multimodal data in the same way we’ve always been able to with structured business data. Now we can search for it using natural language, rather than specialized query languages. Considering that 80%+ of the data that enterprises create every day is unstructured, we start to see how vector search combined with LLMs and generative AI opens up new use cases and revenue streams. In this latest round-up of companies building AI with MongoDB, we feature three examples who are doing just that. The future of business data: Unlocking the hidden potential of unstructured data In today's data-driven world, businesses are always searching for ways to extract meaningful insights from the vast amounts of information at their disposal. From improving customer experiences to enhancing employee productivity, the ability to leverage data enables companies to make more informed and strategic decisions. However, most of this valuable data is trapped in complex formats, making it difficult to access and analyze. That's where Unstructured.io comes in. Imagine an innovative tool that can take all of your unstructured data – be it a PDF report, a colorful presentation, or even an image – and transform it into an easily accessible format. This is exactly what Unstructured.io does. They delve deep, pulling out crucial data, and present it in a simple, universally understood JSON format. This makes your data ready to be transformed, stored and searched in powerful databases like MongoDB Atlas Vector Search . What does this mean for your business? It's simple. By automating the data extraction process, you can quickly derive actionable insights, offering enhanced value to your customers and improving operational efficiencies. Unstructured also offers an upcoming image-to-text model. This provides even more flexibility for users to ingest and process nearly any file containing natural language data. And, keep an eye out for notable upgrades in table extraction – yet another step in ensuring you get the most from your data. Unstructured.io isn't just a tool for tech experts. It's for any business aiming to understand their customers better, seeking to innovate, and looking to stay ahead in a competitive landscape. Unstructured’s widespread usage is a testament to its value – with over 1.5 million downloads and adoption by thousands of enterprises and government organizations. Brian Raymond, the founder and CEO of Unstructured.io, perfectly captures this synergy, saying, “As the world’s most widely used natural language ingestion and preprocessing platform, partnering with MongoDB was a natural choice for us. This collaboration allows for even faster development of intelligent applications. Together, we're paving the way businesses harness their data.” MongoDB and Unstructured.io are bridging the gap between data and insights, ensuring businesses are well-equipped to navigate the challenges of the digital age. Whether you’re a seasoned entrepreneur or just starting, it's time to harness the untapped potential of your unstructured data. Visit Unstructured.io to get started with any of their open-source libraries. Or join Unstructured’s community Slack and explore how to seamlessly use your data in conjunction with large language models. Making sense of complex contracts with entity extraction and analysis Catylex is a revolutionary contract analytics solution for any business that needs to extract and optimize contract data. The company’s best-in-class contract AI automatically recognizes thousands of legal and business concepts out-of-the-box, making it easy to get started and quickly generate value. Catylex’s AI models transform wordy, opaque documents into detailed insights revealing rights, obligations, risks, and commitments associated with the business, its suppliers, and customers. The insights generated can be used to accelerate contract review and to feed operational and risk data into core business systems (CLMs, ERPs, etc.) and teams. Documents are processed using Catylex’s proprietary extraction pipeline that uses a combination of various machine learning/NLP techniques (custom Named Entity Recognition, Text Classification) and domain expert augmentation to parse documents into an easy-to-query ontology. This eliminates the need for end users to annotate data or train any custom models. The application is very intuitive and provides easy-to-use controls to Quality Check the system-extracted data, search and query using a combination of text and concepts, and generate visualizations across portfolios. You can try all of this for free by signing up for the “Essentials'' version of Catylex . Catylex leverages a suite of applications and features from the MongoDB Atlas developer data platform . It uses the MongoDB Atlas database to store documents and extract metadata due to its flexible data model and easy-to-scale options, and it uses Atlas Search to provide end users with easy-to-use and efficient text search capabilities. Features like highlighting within Atlas Search add a lot of value and enhance the user experience. Atlas Triggers are used to handle change streams and efficiently relay information to various parts within the Catylex application to make it event-driven and scalable. Catylex is actively evaluating Atlas Vector Search. Bringing together vector search alongside keyword search and database in a single, fully synchronized, and flexible storage layer, accessed by a single API, will simplify development and eliminate technology sprawl. Being part of the MongoDB AI Innovators Program gives Catylex’s engineers direct access to the product management team at MongoDB, helping to share feedback and receive the latest product updates and best practices. The provision of Atlas credits reduces the costs of experimenting with new features. Co-marketing initiatives help build visibility and awareness of the company’s offerings. Harness Generative AI with observed and dark data for customer 360 Dataworkz enables enterprises to harness the power of LLMs with their own proprietary data for customer applications. The company’s products empower businesses to effortlessly develop and implement Retrieval-Augmented Generation (RAG) applications using proprietary data, utilizing either public LLM APIs or privately hosted open-source foundation models. The emergence of hallucinations presents a notable obstacle in the widespread adoption of Gen AI within enterprises. Dataworkz streamlines the implementation of RAG applications enabling Gen AI to reference its origins, consequently enhancing traceability. As a result, users can easily use conversational natural language to produce high-quality, LLM-ready, customer 360 views powering chatbots, Question-Answering systems, and summarization services. Dataworkz provides connectors for a vast array of customer data sources. These include back-office SaaS applications such as CRM, Marketing Automation, and Finance systems. In addition, leading relational and NoSQL databases, cloud object stores, data warehouses, and data lake houses are all supported. Dataflows, aka composable AI-enabled workflows, are a set of steps that users combine and arrange to perform any sort of data transformation – from creating vector embeddings to complex JSON transformations. Users can describe data wrangling tasks in natural language, have LLMs orchestrate the processing of data in any modality, and merge it into a “golden” 360-degree customer view. MongoDB Atlas is used to store the source document chunks for this customer's 360-degree view and Atlas Vector Search is used to index and query the associated vector embeddings. The generation of outputs produced by the customer’s chosen LLM is augmented with similarity search and retrieval powered by Atlas. Public LLMs such as OpenAI and Cohere or privately hosted LLMs such as Databricks Dolly are also available. The integrated experience of the MongoDB Atlas database and Atlas Vector Search simplifies developer workflows. Dataworkz has the freedom and flexibility to meet their customers wherever they run their business with multi-cloud support. For Dataworkz, access to Atlas credits and the MongoDB partner ecosystem are key drivers for becoming part of the AI Innovators program. What's next? If you are building AI-enabled apps on MongoDB, sign up for our AI Innovators Program . We’ve had applicants from all industries building for a huge diversity of new use cases. To get a flavor, take a look at earlier blog posts in this series: Building AI with MongoDB: First Qualifiers includes AI at the network edge for computer vision and augmented reality; risk modeling for public safety; and predictive maintenance paired with Question-Answering systems for maritime operators. Building AI with MongoDB: Compliance to Copilots features AI in healthcare along with intelligent assistants that help product managers specify better products and help sales teams compose emails that convert 2x higher. Finally, check out our MongoDB for Artificial Intelligence resources page for the latest best practices that get you started in turning your idea into AI-driven reality.
A Powerful Platform for Parents and Educators
When I created the first versions of OWNA , I started with a target customer: my wife. When my children were entering childcare my wife and I realized we had little visibility of what was happening during the day. When I arrived to pick up my child, I often forgot to ask for the stats of the day – things like whether they had eaten, if they had napped, the number of nappy changes, and other information. My wife would ask me, and I wouldn’t have a clue because I’d forgotten to look at the paper-based report that detailed whether my child had eaten, the number of nappy changes, and whether they had napped. Starting from that foundation, I asked lots of questions and learned that childcare centers face many challenges. The problem wasn’t a lack of intent on the part of the staff at the childcare center. They simply lacked the tools to do this in an effective way that didn’t get in the way of the work they were doing. That led me to pivot from a parent-centric view to a broader one. Having started the initial development on MongoDB's document database , I was able to scale and iterate as I had a platform that could grow and be easily adapted. OWNA started as a tool for one childcare center and has now evolved and covers the full gamut of services that childcare centers offer. From that single center, OWNA is now used in over 2,500 childcare centers across Australia, and we have created localized versions for North America and Europe. How to create an app that meets challenging compliance requirements and offers flexibility to meet diverse needs When I started this journey, I looked at how information was recorded and managed at my local childcare center. Almost everything was on paper. Parents want to be able to easily access the information educators are recording and educators and the centers themselves need to store that data and make sure they meet compliance obligations. Paper-based records are costly to store, difficult to search and centers are subject to regulatory obligations to maintain records. With childcare centers moving toward electronic systems, we also solved another problem – the sprawl of disjointed applications centers used. We learned that there was a lot of switching between apps and copying data to ensure information was synchronized across applications. OWNA is a one-stop shop for childcare centers. It enables them to record and share everything from meals and nappy changes, manage staff and rosters, capture documents, images, and video, and support back-office operations with comprehensive Customer Relationship Management (CRM) and payment platforms. By listening carefully to the needs of educators and parents, we developed OWNA to meet the requirements of both groups. MongoDB Atlas enabled OWNA to scale and adapt to new customer needs MongoDB has been foundational to OWNA’s success. We needed a database that was easy to set up, used few system resources, and didn’t get in the way as we added features. MongoDB met those needs with flexible data structures without compromising performance. One of the key benefits of building on the MongoDB foundation is the ability to adapt the database to meet new customer needs. For example, when it came to recording when children ate, teachers initially recorded a simple yes or no in a field. However, we were able to change that field type, on the fly, into a field that allowed educators to enter how much of a meal was eaten. That change was important to parents and gave educators the ability to communicate more clearly with parents and carers. As the app’s popularity grew, we wanted to ensure OWNA was secure, scalable, and resilient. While MongoDB’s self-managed database was a great platform for us to start our journey with OWNA, as we grew we needed something to enable the business to scale and free up even more developer time. It was at this point that we started looking at MongoDB Atlas, as the managed service meant almost all of the operational and management burden was either completely removed or reduced to a few clicks. Moving to Atlas gave us the power to not only scale the application to more clients but also increase our developer productivity which meant we could focus our efforts on building an even better app. We could devote resources to development and customer support rather than managing the database. This shift enabled OWNA to scale more effectively and because of the superior business continuity with increased uptime and better resilience, it had a direct positive impact on our customers too. MongoDB Atlas lets us take advantage of multiple cloud providers. In our case, we use Microsoft Azure and Google Cloud Platform depending on the region or service we’re looking for. MongoDB Atlas enables global growth and expansion of OWNA's services The platform we’ve built on is now powering our next wave of innovation and development. For example, we’re launching the Family Marketplace – an online store for parents and educators. They’ll be able to order supplies such as nappies, stationery, craft supplies, and other essentials directly from OWNA. MongoDB Atlas will be the foundation and we'll use MongoDB Search so that users can find products and receive recommendations to make it easy for educators and parents to find the items they need. Using MongoDB Atlas Search eliminates the need for Owna to run a separate search system alongside the database. This simplifies the architecture and helps developers focus on value rather than managing data integration and syncing. The entire process will be handled within OWNA. Goods will be delivered directly to the center. For parents, this eliminates squeezing trips to shops between drop-offs, pick-ups, and work. The story for us doesn’t stop with OWNA. We’re also creating two new apps that are built on MongoDB. ERLY is a workforce management tool that enables small businesses to manage recruitment, rosters, payroll, and other key activities. And, by listening to educators that use OWNA, we learned that there was a desire for an app where qualified childcare workers could offer their services as babysitters. That led to the development of Nurture – a service that connects parents to babysitters. MongoDB’s tools let us develop apps with less code. The apps we create are easy to maintain and we can develop new features faster than with other platforms. The development and growth of OWNA has, from the first moment, been powered by MongoDB. The ability to quickly develop apps and features, easily maintain the apps and deploy them either on-premises, using hybrid infrastructure, or wholly on the cloud has enabled OWNA to grow and expand globally. Kheang Ly is the founder & CTO of OWNA. Overseeing the entire OWNA operations and building the best and most innovative platform. Learn more about OWNA .
Resource Tags in Billing Invoices Now Generally Available
In June, we announced MongoDB’s Atlas new resource tagging capability - built to provide a simple and easy way to organize and manage database deployments at scale. Today, we are pleased to announce the availability of these tags in billing invoice CSVs and Admin API operations. Leveraging tags within billing invoices empowers customers to categorize spending based on custom dimensions (such as cost center) and streamline financial reporting. Why we built Resource Tags in Billing Invoices Today, billing-related tags are the recommended way to allocate costs based on dimensions unique to customer organizations. Previously, customers were exporting CSV invoices or using the Admin API to manually append additional metadata, and generate custom financial reports resulting in a significant amount of time lost. Other customers were naming their deployments after internal billing references instead of meaningful names for their development teams. Addressing this leads to an enhanced user experience, streamlined reporting processes, and ensures invoices can meet the unique cost allocation needs of our customers. How customers can benefit from Resource Tags in Billing Invoices In today's cost-conscious micro-environment, allowing customers to segment their spending based on custom metadata enables them to optimize financial reporting by categorizing expenses based on custom dimensions that go beyond project or cluster names. With the flexibility to break down costs according to specific business units, engineering environments, or individual lines of business, organizations of all sizes can tailor their cost allocation requirements accordingly. By streamlining the reporting process and providing comprehensive financial visibility, customers are empowered with more informed decisions. How to get started Customers can get started today by adding tags to new or existing deployments using the Atlas UI, Admin API, or CLI. Once added, the tag keys will appear as column headings in the billing invoice CSV export and as a field in the Invoices payload for the Admin API calls. Tags will typically appear on Invoice line items within 48 hours of being added to the deployment. Also, tags will not appear on cluster related line items that are not associated with a tag and line items not billed at the cluster level are not eligible for tagging at this time. The example below shows an example CSV export for the fictitious project, Leafy. This organization has two tag keys (Environment and Cost-Center) associated with different clusters. Notice that the cluster named “Blue” does not have the Cost-Center tag applied, so those cells are blank. Resource tagging best practices Tagging approaches can range from simple to complex, depending on the needs of the organization. We have outlined tips and best practices to get started: Do not include any sensitive information such as Personally Identifiable Information (PII) or Protected Health Information (PHI) in resource tag keys or values. Use a standard naming convention - including spelling, case, and punctuation for all tags. Define and communicate a strategy that enforces mandatory tags on all database deployments. We recommend you start with identifying the environment and the application/service/workload. Use namespaces or prefixes to easily identify tags owned by different business units. Use automated tools such as Terraform or the Admin API to programmatically manage database deployments and tags. Implement a tag governance routine that regularly checks for untagged or improperly tagged deployments. Conclusion Tags applied to Billing Invoices meet customers' critical needs by allowing them to categorize spending based on custom dimensions, optimizing financial reporting, and increasing trust and efficiency. Customers gain the flexibility to break down costs according to various business units, engineering environments, or specific lines of business, making financial analysis more efficient and more insightful than ever before. Sign up for MongoDB Atlas , our cloud database service, to see tagging in action, and for more information, see Atlas Resource Tagging .
Building AI with MongoDB: How VISO TRUST is Transforming Cyber Risk Intelligence
Since announcing MongoDB Atlas Vector Search preview availability back in June, we’ve seen rapid adoption from developers building a wide range of AI-enabled apps. Today we're going to talk to one of these customers. VISO TRUST puts reliable, comprehensive, actionable vendor security information directly in the hands of decision-makers who need to make informed risk assessments. The company uses a combination of state-of-the-art models from OpenAI, Hugging Face, Anthropic, Google, and AWS, augmented by vector search and retrieval from MongoDB Atlas. We sat down with Pierce Lamb, Senior Software Engineer on the Data and Machine Learning team at VISO TRUST to learn more. Tell us a little bit about your company. What are you trying to accomplish and how that benefits your customers or society more broadly? VISO TRUST is an AI-powered third-party cyber risk and trust platform that enables any company to access actionable vendor security information in minutes. VISO TRUST delivers the fast and accurate intelligence needed to make informed cybersecurity risk decisions at scale for companies at any maturity level. Our commitment to innovation means that we are constantly looking for ways to optimize business value for our customers. VISO TRUST ensures that complex business-to-business (B2B) transactions adequately protect the confidentiality, integrity, and availability of trusted information. VISO TRUST’s mission is to become the largest global provider of cyber risk intelligence and become the intermediary for business transactions. Through the use of VISO TRUST, customers will reduce their threat surface in B2B transactions with vendors and thereby reduce the overall risk posture and potential security incidents like breaches, malicious injections, and more. Today VISO TRUST has many great enterprise customers like InstaCart, Gusto, and Upwork and they all say the same thing: 90% less work, 80% reduction in time to assess risk, and near 100% vendor adoption. Because it’s the only approach that can deliver accurate results at scale, for the first time, customers are able to gain complete visibility into their entire third-party populations and take control of their third-party risk. Describe what your application does and what role AI plays in it The VISO TRUST Platform approach uses patented, proprietary machine learning and a team of highly qualified third-party risk professionals to automate this process at scale. Simply put, VISO TRUST automates vendor due diligence and reduces third-party at scale. And security teams can stop chasing vendors, reading documents, or analyzing spreadsheets. Figure 1: VISO TRUST is the only SaaS third-party cyber risk management platform that delivers the rapid security intelligence needed for modern companies to make critical risk decisions early in the procurement process VISO TRUST Platform easily engages third parties, saving everyone time and resources. In a 5-minute web-based session third parties are prompted to upload relevant artifacts of the security program that already exists and our supervised AI – we call Artifact Intelligence – does the rest. Security artifacts that enter VISO’s Artifact Intelligence pipeline interact with AI/ML in three primary ways. First, VISO deploys discriminator models that produce high-confidence predictions about features of the artifact. For example, one model performs artifact classification, another detects organizations inside the artifact, another predicts which pages are likely to contain security controls, and more. Our modules reference a comprehensive set of over 25 security frameworks and use document heuristics and natural language processing to analyze any written material and extract all relevant control information. Secondly, artifacts have text content parsed out of them in the form of sentences, paragraphs, headers, table rows, and more; these text blobs are embedded and stored in MongoDB Atlas to become part of our dense retrieval system. This dense retrieval system performs retrieval-augmented generation (RAG) using MongoDB features like Atlas Vector Search to provide ranked context to large language model (LLM) prompts. Thirdly, we use RAG results to seed LLM prompts and chain together their outputs to produce extremely accurate factual information about the artifact in the pipeline. This information is able to provide instant intelligence to customers that previously took weeks to produce. VISO TRUST’s risk model analyzes your risk and delivers a complete assessment that provides everything you need to know to make qualified risk decisions about the relationship. In addition, the platform continuously monitors and reassesses third-party vendors to ensure compliance. What specific AI/ML techniques, algorithms, or models are utilized in your application? For our discriminator models, we research the state-of-the-art pre-trained models (typically narrowed by those contained in HuggingFace’s transformers package) and perform fine-tuning of these models using our dataset. For our dense retrieval system, we use MongoDB Atlas Vector Search which internally uses the Hierarchical Navigable Small Worlds algorithm to retrieve similar embeddings to embedded text content. We have plans to perform a re-ranking of these results as well. For our LLM system, we have experimented with GPT3.5-turbo, GPT4, Claude 1 & 2, Bard, Vertex, and Bedrock. We blend a variety of these based on our customer's accuracy, latency, and security needs. Can you describe other AI technologies used in your application stack? Some of the other frameworks we use are HuggingFace transformers, evaluate, accelerate, and Datasets, PyTorch, WandB, and Amazon Sagemaker. We have a library for ML experiments (fine-tuning) that is custom-built, a library for workflow orchestration that is custom-built, and all of our prompt engineering is custom-built. Why did you choose MongoDB as part of your application stack? Which MongoDB features are you using and where are you running MongoDB? The VISO TRUST Platform relies on effective solutions and tools like MongoDB's distinctive attributes to fulfill specific objectives. MongoDB supports our platform's mechanism to engage third parties efficiently, employing both AI and human oversight to automate the assessment of security artifacts at scale. The fundamental value proposition of MongoDB – a robust document database – is why we originally chose it. It was originally deployed as a storage/retrieval mechanism for all the factual information our artifact intelligence pipeline produces about artifacts. While it still performs this function today, it has now become our “vector/metadata database.” MongoDB executes fast ranking of large quantities of embedded text blobs for us while Atlas provides us with all the ease-of-use of a cloud-ready database. We use both the Atlas search index visualization, and the query profiler visualization daily. Even just the basic display of a few documents in collections often saves time. Finally, when we recently backfilled embeddings across one of our MongoDB deployments, Atlas would automatically provision more disk space for large indexes without us needing to be around which was incredibly helpful. What are the benefits you've achieved by using MongoDB? I would say there are two primary benefits that have greatly helped us with respect to MongoDB and Atlas. First, MongoDB was already a place where we were storing metadata about artifacts in our system; with the introduction of Atlas Vector Search now we have a comprehensive vector/metadata database – that’s been battle-tested over a decade – that solves our dense retrieval needs. No need to deploy a new database we have to manage and learn. Our vectors and artifact metadata can be stored right next to each other. Second, Atlas has been helpful in making all the painful parts of database management easy. Creating indexes, provisioning capacity, alerting slow queries, visualizing data, and much more have saved us time and allowed us to focus on more important things. What are your future plans for new applications and how does MongoDB fit into them? Retrieval-augmented generation is going to continue to be a first-class feature of our application. In this regard, the evolution of Atlas Vector Search and its ecosystem in MongoDB will be highly relevant to us. MongoDB has become the database our ML team uses, so as our ML footprint expands, our use of MongoDB will expand. Getting started Thanks so much to Pierce for sharing details on VISO TRUST’s AI-powered applications and experiences with MongoDB. The best way to get started with Atlas Vector Search is to head over to the product page . There you will find tutorials, documentation, and whitepapers along with the ability to sign up for MongoDB Atlas. You’ll just be a few clicks away from spinning up your own vector search engine where you can experiment with the power of vector embeddings and RAG. We’d love to see what you build, and are eager for any feedback that will make the product even better in the future!
The Challenges and Opportunities of Processing Streaming Data
Let’s consider a fictitious bank that has a credit card offering for its customers. Transactional data might land in their database from various sources such as a REST API call from a web application or from a serverless function call made by a cash machine. Regardless of how the data was written to the database, the database performed its job and made the data available for querying by the end-user or application. The mechanics are database-specific but the end goal of all databases is the same. Once data is in a database the bank can query and obtain business value from this data. In the beginning, their architecture worked well, but over time customer usage grew and the bank found it difficult to manage the volume of transactions. The company decides to do what many customers in this scenario do and adopts an event-streaming platform like Apache Kafka to queue these event data. Kafka provides a highly scalable event streaming platform capable of managing large data volumes without putting debilitating pressure on traditional databases. With this new design, the bank could now scale supporting more customers and product offerings. Life was great until some customers started complaining about unrecognized transactions occurring on their cards. Customers were refusing to pay for these and the bank was starting to spend lots of resources figuring out how to manage these fraudulent charges. After all, by the time the data gets written into the database, and the data is batch loaded into the systems that can process the data, the user's credit card was already charged perhaps a few times over. However, hope is not lost. The bank realized that if they could query the transactional event data as it's flowing into the database they might be able to compare it with historical spending data from the user, as well as geolocation information, to make a real-time determination if the transaction was suspicious and warranted further confirmation by the customer. This ability to continuously query the stream of data is what stream processing is all about. From a developer's perspective, building applications that work with streaming data is challenging. They need to consider the following: Different serialization formats: The data that arrives in the stream may contain different serialization formats such as JSON, AVRO, Protobuf or even binary. Different schemas: Data originating from a variety of sources may contain slightly different schemas. Fields like CustomerID could be customerId from one source or CustID in another and a third could not even use the field. Late arriving data: The data itself could arrive late due to network latency issues or being completely out of order. Operational complexity: Developers need to be concerned with reacting to application state changes like failed connections to data sources and how to efficiently scale the application to meet the demands of the business. Security: In larger enterprises, the developer usually doesn’t have access to production data. This makes troubleshooting and building queries from this data difficult. Stream processing can help address these challenges and enable real-time use cases, such as fraud detection, hyper-personalization, and predictive maintenance, that are otherwise difficult or extremely costly to overcome. While many stream processing solutions exist, the flexibility of the document model and the power of the aggregation framework are naturally well suited to help developers with the challenges found with complex event data. Discover MongoDB Atlas Stream Processing Check out the MongoDB Atlas Stream Processing announcement blog post. Request private preview access to Atlas Stream processing Learn more about Atlas Stream Processing and request access to participate in the private preview once it opens to developers. New to MongoDB? Get started for free today by signing up for MongoDB Atlas .
MongoDB.local is Coming to Hong Kong
MongoDB is hosting a series of events in cities all around the world that we're calling "MongoDB.local", and our Hong Kong event is approaching fast. Don’t forget to reserve a spot on your calendar for MongoDB.local Hong Kong on September 5, a one-day conference on all things MongoDB, including new product announcements, technical sessions, and opportunities to learn how customers are using MongoDB to reinvent their businesses. Hear from our guest speakers, including Cathay Pacific Airways , Manulife Asia , TVB and Omnichat , who will be sharing how they use MongoDB in various innovative ways to improve their operations and deliver enhanced customer experience. The MongoDB user community represents a wide range of backgrounds and experiences. We've tailored our sessions to cater to the diversity of our developer community. From data modeling and schema design to app-driven analytics and mobile data sync, our sessions will explore real-world scenarios and how to optimize MongoDB for performance, reliability, and scale. There's more than one way to get your questions answered at MongoDB.local. You can see live demos at the MongoDB booth and engage in networking throughout the day. In addition, attendees will be able to learn about the latest news on the product front. We've been hearing from customers who are facing a lot of challenges today, including increasing data privacy and security requirements, as well as the pain of modernizing their applications. So we'll be announcing some recent solutions for de-risking application migration and modernization along with some very exciting product announcements that are enabling customers to build next-generation applications. In the keynote, members of our executive leadership team — including MongoDB Chief Marketing & Strategy Officer, Peder Ulander — will be announcing the latest features and updates for MongoDB, along with how they see developing modern applications shaping up over the next few years. We look forward to seeing you at MongoDB.local Hong Kong . So don't wait to register.