4 results

Fusing MongoDB and Databricks to Deliver AI-Augmented Search

With customers' attention more and more dispersed across channels, platforms, and devices, the retail industry rages with the relentless competition. The customer’s search experience on your storefront is the cornerstone of capitalizing on your Zero Moment of Truth, the point in the buying cycle where the consumer's impression of a brand or product is formed. Imagine a customer, Sarah, eager to buy a new pair of hiking boots. Instead of wandering aimlessly through pages and pages of search results, she expects to find her ideal pair easily. The smoother her search, the more likely she is to buy. Yet, achieving this seamless experience isn't a walk in the park for retailers. Enter the dynamic duo of MongoDB and Databricks. By equipping their teams with this powerful tech stack, retailers can harness the might of real-time in-app analytics. This not only streamlines the search process but also infuses AI and advanced search functionalities into e-commerce applications. The result? An app that not only meets Sarah's current expectations but anticipates her future needs. In this blog, we’ll help you navigate through what are the main reasons to implement an AI-augmented search solution by integrating both platforms. Let’s embark on this! A solid foundation for your data model For an e-commerce site built around the principles of an Event Driven and MACH Architecture , the data layer will need to ingest and transform data from a number of different sources. Heterogeneous data, such as product catalog, user behavior on the e-commerce front-end, comments and ratings, search keywords, and customer lifecycle segmentation- all of this is necessary to personalize search results in real time. This increases the need for a flexible model such as in MongoDB’s documents and a platform that can easily take in data from a number of different sources- from API, CSV, and Kafka topics through the MongoDB Kafka Connector . MongoDB's Translytical capabilities, combining transactional (OLTP) and analytical (OLAP) offer real-time data processing and analysis, enabling you to simplify your workloads while ensuring timely responsiveness and cost-effectiveness. Now the data platform is servicing the operational needs of the application- what about adding in AI? Combining MongoDB with Databricks, using the MongoDB Spark Connector can allow you to train your models with your operational data from MongoDB easily and to trigger them to run in real-time to augment your application as the customer is using it. Centralization of heterogeneous data in a robust yet flexible Operational Data Layer The foundation of an effective e-commerce data layer lies in having a solid yet flexible operational data platform, so the orchestrating of ML models to run at specific timeframes or responding to different events, enabling crucial data transformation, metadata enrichment, and data featurization becomes a simple, automated task for optimizing search result pages and deliver a frictionless purchasing process. Check out this blog for a tutorial on achieving near real-time ingestion using the Kafka Connector with MongoDB Atlas, and data processing with Databricks Spark User Defined Functions. Adding relevance to your search engine results pages To achieve optimal product positioning on the Search Engine Results Page (SERP) after a user performs a query, retailers are challenged with creating a business score for their products' relevance. This score incorporates various factors such as stock levels, competitor prices, and price elasticity of demand. These business scores are complex real-time analyses calibrated against so many factors- it’s a perfect use case for AI. Adding AI-generated relevance to your SERPs can accurately predict and display search results that are most relevant to users' queries, leading to higher engagement and increased click-through rates, while also helping businesses optimize their content based on the operational context of their markets. The ingestion into the MongoDB Atlas document-based model laid the groundwork for this challenge, and leveraging the MongoDB Apache Spark Streaming Connector companies can persist their data into Databricks, taking advantage of its capabilities for data cleansing and complex data transformations, making it the ideal framework for delivering batch training and inference models. Diagram of the full architecture integrating MongoDB Atlas and Databricks for an e-commerce store, real-time analytics, and search MongoDB App Services act as the mortar of our solution, achieving an overlap of the intelligence layer in an event-driven way, making it not only real-time but also cost-effective and rendering both your applications and business processes nimble. Make sure to check out this GitHub repository to understand in depth how this is achieved. Data freshness Once that business score can be calculated comes the challenge of delivering it over the search feature of your application. With MongoDB Atlas native workload isolation, operational data is continuously available on dedicated analytics nodes deployed in the same distributed cluster, and exposed to analysts within milliseconds of being stored in the database. But data freshness is not only important for your analytics use cases, combining both your operational data with your analytics layer, retailers power in-app analytics and build amazing user experiences across your customer touch points. Considering MongoDB Atlas Search 's advanced features such as faceted search, auto-complete, and spell correction, retailers rest assured of a more intuitive and user-friendly search experience not only for their customers but for their developers, as it minimizes the tax of operational complexity as all these functionalities are bundled in the same platform. App-driven analytics is a competitive advantage against traditional warehouse analytics Additionally, the search functionality is optimized for performance, enabling businesses to handle high search query volumes without compromising user experience. The business score generated from the AI models trained and deployed with Databricks will provide the central point to act as a discriminator over where in the SERPs any of the specific products appear, rendering your search engine relevance fueled and securing the delivery of a high-quality user experience. Conclusion Search is a key part of the buying process for any customer. Showing customers exactly what they are looking for without investing too much time in the browsing stage reduces friction in the buying process, but as we’ve seen it might not be so easy technically. Empower your teams with the right tech stack to take advantage of the power of real-time in-app analytics with MongoDB and Databricks. It’s the simplest way to build AI and search capabilities into your e-commerce app, to respond to current and future market expectations. Check out the video below and this GitHub repository for all the code needed to integrate MongoDB and Databricks and deliver a real-time machine-learning solution for AI-augmented Search.

September 19, 2023

Building AI with MongoDB: Unlocking Value from Multimodal Data

One of the most powerful capabilities of AI is its ability to learn, interpret, and create from input data of any shape and modality. This could be structured records stored in a database to unstructured text, computer code, video, images, and audio streams. Vector embeddings are one of the key AI enablers in this space. Encoding our data as vector embeddings dramatically expands the ability to work with this multimodal data. We’ve gone from depending on data scientists training highly specialized models just a few years ago to developers today building general-purpose apps incorporating NLP and computer vision. The beauty of vector embeddings is that data that is unstructured and therefore completely opaque to a computer can now have its meaning and structure inferred and represented via these embeddings. Using a vector store such as Atlas Vector Search means we can search and compute unstructured and multimodal data in the same way we’ve always been able to with structured business data. Now we can search for it using natural language, rather than specialized query languages. Considering that 80%+ of the data that enterprises create every day is unstructured, we start to see how vector search combined with LLMs and generative AI opens up new use cases and revenue streams. In this latest round-up of companies building AI with MongoDB, we feature three examples who are doing just that. The future of business data: Unlocking the hidden potential of unstructured data In today's data-driven world, businesses are always searching for ways to extract meaningful insights from the vast amounts of information at their disposal. From improving customer experiences to enhancing employee productivity, the ability to leverage data enables companies to make more informed and strategic decisions. However, most of this valuable data is trapped in complex formats, making it difficult to access and analyze. That's where comes in. Imagine an innovative tool that can take all of your unstructured data – be it a PDF report, a colorful presentation, or even an image – and transform it into an easily accessible format. This is exactly what does. They delve deep, pulling out crucial data, and present it in a simple, universally understood JSON format. This makes your data ready to be transformed, stored and searched in powerful databases like MongoDB Atlas Vector Search . What does this mean for your business? It's simple. By automating the data extraction process, you can quickly derive actionable insights, offering enhanced value to your customers and improving operational efficiencies. Unstructured also offers an upcoming image-to-text model. This provides even more flexibility for users to ingest and process nearly any file containing natural language data. And, keep an eye out for notable upgrades in table extraction – yet another step in ensuring you get the most from your data. isn't just a tool for tech experts. It's for any business aiming to understand their customers better, seeking to innovate, and looking to stay ahead in a competitive landscape. Unstructured’s widespread usage is a testament to its value – with over 1.5 million downloads and adoption by thousands of enterprises and government organizations. Brian Raymond, the founder and CEO of, perfectly captures this synergy, saying, “As the world’s most widely used natural language ingestion and preprocessing platform, partnering with MongoDB was a natural choice for us. This collaboration allows for even faster development of intelligent applications. Together, we're paving the way businesses harness their data.” MongoDB and are bridging the gap between data and insights, ensuring businesses are well-equipped to navigate the challenges of the digital age. Whether you’re a seasoned entrepreneur or just starting, it's time to harness the untapped potential of your unstructured data. Visit to get started with any of their open-source libraries. Or join Unstructured’s community Slack and explore how to seamlessly use your data in conjunction with large language models. Making sense of complex contracts with entity extraction and analysis Catylex is a revolutionary contract analytics solution for any business that needs to extract and optimize contract data. The company’s best-in-class contract AI automatically recognizes thousands of legal and business concepts out-of-the-box, making it easy to get started and quickly generate value. Catylex’s AI models transform wordy, opaque documents into detailed insights revealing rights, obligations, risks, and commitments associated with the business, its suppliers, and customers. The insights generated can be used to accelerate contract review and to feed operational and risk data into core business systems (CLMs, ERPs, etc.) and teams. Documents are processed using Catylex’s proprietary extraction pipeline that uses a combination of various machine learning/NLP techniques (custom Named Entity Recognition, Text Classification) and domain expert augmentation to parse documents into an easy-to-query ontology. This eliminates the need for end users to annotate data or train any custom models. The application is very intuitive and provides easy-to-use controls to Quality Check the system-extracted data, search and query using a combination of text and concepts, and generate visualizations across portfolios. You can try all of this for free by signing up for the “Essentials'' version of Catylex . Catylex leverages a suite of applications and features from the MongoDB Atlas developer data platform . It uses the MongoDB Atlas database to store documents and extract metadata due to its flexible data model and easy-to-scale options, and it uses Atlas Search to provide end users with easy-to-use and efficient text search capabilities. Features like highlighting within Atlas Search add a lot of value and enhance the user experience. Atlas Triggers are used to handle change streams and efficiently relay information to various parts within the Catylex application to make it event-driven and scalable. Catylex is actively evaluating Atlas Vector Search. Bringing together vector search alongside keyword search and database in a single, fully synchronized, and flexible storage layer, accessed by a single API, will simplify development and eliminate technology sprawl. Being part of the MongoDB AI Innovators Program gives Catylex’s engineers direct access to the product management team at MongoDB, helping to share feedback and receive the latest product updates and best practices. The provision of Atlas credits reduces the costs of experimenting with new features. Co-marketing initiatives help build visibility and awareness of the company’s offerings. Harness Generative AI with observed and dark data for customer 360 Dataworkz enables enterprises to harness the power of LLMs with their own proprietary data for customer applications. The company’s products empower businesses to effortlessly develop and implement Retrieval-Augmented Generation (RAG) applications using proprietary data, utilizing either public LLM APIs or privately hosted open-source foundation models. The emergence of hallucinations presents a notable obstacle in the widespread adoption of Gen AI within enterprises. Dataworkz streamlines the implementation of RAG applications enabling Gen AI to reference its origins, consequently enhancing traceability. As a result, users can easily use conversational natural language to produce high-quality, LLM-ready, customer 360 views powering chatbots, Question-Answering systems, and summarization services. Dataworkz provides connectors for a vast array of customer data sources. These include back-office SaaS applications such as CRM, Marketing Automation, and Finance systems. In addition, leading relational and NoSQL databases, cloud object stores, data warehouses, and data lake houses are all supported. Dataflows, aka composable AI-enabled workflows, are a set of steps that users combine and arrange to perform any sort of data transformation – from creating vector embeddings to complex JSON transformations. Users can describe data wrangling tasks in natural language, have LLMs orchestrate the processing of data in any modality, and merge it into a “golden” 360-degree customer view. MongoDB Atlas is used to store the source document chunks for this customer's 360-degree view and Atlas Vector Search is used to index and query the associated vector embeddings. The generation of outputs produced by the customer’s chosen LLM is augmented with similarity search and retrieval powered by Atlas. Public LLMs such as OpenAI and Cohere or privately hosted LLMs such as Databricks Dolly are also available. The integrated experience of the MongoDB Atlas database and Atlas Vector Search simplifies developer workflows. Dataworkz has the freedom and flexibility to meet their customers wherever they run their business with multi-cloud support. For Dataworkz, access to Atlas credits and the MongoDB partner ecosystem are key drivers for becoming part of the AI Innovators program. What's next? If you are building AI-enabled apps on MongoDB, sign up for our AI Innovators Program . We’ve had applicants from all industries building for a huge diversity of new use cases. To get a flavor, take a look at earlier blog posts in this series: Building AI with MongoDB: First Qualifiers includes AI at the network edge for computer vision and augmented reality; risk modeling for public safety; and predictive maintenance paired with Question-Answering systems for maritime operators. Building AI with MongoDB: Compliance to Copilots features AI in healthcare along with intelligent assistants that help product managers specify better products and help sales teams compose emails that convert 2x higher. Finally, check out our MongoDB for Artificial Intelligence resources page for the latest best practices that get you started in turning your idea into AI-driven reality.

September 11, 2023

Building AI with MongoDB: From Compliance to Copilots

There has been a lot of recent reporting on the desire to regulate AI. But very little has been made of how AI itself can assist with regulatory compliance. In our latest round-up of qualifiers for the MongoDB AI Innovators Program, we feature a company who are doing just that in one of the world’s most heavily regulated industries. Helping comply with regulations is just one way AI can assist us. We hear a lot about copilots coaching developers to write higher-quality code faster. But this isn’t the only domain where AI-powered copilots can shine. To round out this blog post, we provide two additional examples – a copilot for product managers that helps them define better specifications and a copilot for sales teams to help them better engage customers. We launched the MongoDB AI Innovators Program back in June this year to help companies like these “build the next big thing” in AI. Whether a freshly minted start-up or an established enterprise, you can benefit from the program, so go ahead and sign up. In the meantime, let's explore how innovators are using MongoDB for use cases as diverse as compliance to copilots. AI-powered compliance for real-time healthcare data Inovaare transforms complex compliance processes by designing configurable AI-driven automation solutions. These solutions help healthcare organizations collect real-time data across internal and external departments, creating one compliance management system. Founded 10 years ago and now with 250 employees, Inovaare's comprehensive suite of HIPAA-compliant software solutions enables healthcare organizations across the Americas to efficiently meet their unique business and regulatory requirements. They can sustain audit readiness, reduce non-compliance risks, and lower overall operating costs. Inovaare uses classic and generative AI models to power a range of services. Custom models are built with PyTorch while LLMs are built with transformers from Hugging Face and developed and orchestrated with LangChain . MongoDB Atlas powers the models’ underlying data layer. Models are used for document classification along with information extraction and enrichment. Healthcare professionals can work with this data in multiple ways including semantic search and the company’s Question-Answering chatbot. A standalone vector database was originally used to store and retrieve each document’s vector embeddings as part of in-context model prompting. Now Inovaare has migrated to Atlas Vector Search . This migration helps the company’s developers build faster through tight vector integration with the transactional, analytical, and full-text search data services provided by the MongoDB Atlas platform . Next-generation healthcare compliance platform from Inovaare. The platform provides AI-powered health plan solutions with continuous monitoring, regulatory reporting, and business intelligence. Inovaare also uses AI agents to orchestrate complex workflows across multiple healthcare business processes, with data collected from each process stored in the MongoDB Atlas database. Business users can visualize the latest state of healthcare data with natural language questions translated by LLMs and sent to Atlas Charts for dashboarding. Inovaare selected MongoDB because its flexible document data model enables the company's developers to store and query data of any structure. This coupled with Atlas’ HIPAA compliance, end-to-end data encryption, and the freedom to run on any cloud – supporting almost any application workload – helps the company innovate and release with higher velocity and lower cost than having to stitch together an assortment of disparate databases and search engines. Going forward, Inovaare plans to expand into other regions and compliance use cases. As part of MongoDB’s AI Innovators Program, the company’s engineers get to work with MongoDB specialists at every stage of their journey. The AI copilot for product managers The ultimate goal of any venture is to create and deliver meaningful value while achieving product-market fit. Ventecon 's AI Copilot supports product managers in their mission to craft market-leading products and solutions that contribute to a better future for all. Hundreds of bots currently crawl the Internet, identifying and processing over 1,000,000 pieces of content every day. This content includes details on product offerings, features, user stories, reviews, scenarios, acceptance criteria, and issues through market research data from target industries. Processed data is stored in MongoDB. Here it is used by Ventecon’s proprietary NLP models to assist product managers in generating and refining product specifications directly within an AI-powered virtual space. Patrick Beckedorf, co-founder of Ventecon says “Product data is highly context-specific and so we have to pre-train foundation models with specific product management goals, fine-tune with contextual product data, include context over time, and keep it up to date. In doing so, every product manager gets a digital, highly contextualized expert buddy.” Currently, vector embeddings from the product data stored in MongoDB are indexed and queried in a standalone vector database. As Beckedorf says, the engineering team is now exploring a more integrated approach. “The complexity of keeping vector embeddings synchronized across both source and vector databases, coupled with the overhead of running the vector store ties up engineering resources and may affect indexing and search performance. A solid architecture therefore provides opportunities to process and provide new knowledge very fast, i.e. in Retrieval-Augmented Generation (RAG), while bottlenecks in the architecture may introduce risks, especially at scale. This is why we are evaluating Atlas Vector Search to bring source data and vectors together in a single data layer. We can use Atlas Triggers to call our embedding models as soon as new data is inserted into the MongoDB database. That means we can have those embeddings back in MongoDB and available for querying almost immediately.” For Beckedorf, the collaboration with data pioneers and the co-creation opportunities with MongoDB are the most valuable aspects of the AI Innovators Program. AI sales email coaching: 2x reply rates in half the time Lavender is an AI sales email coach. It assists users in real-time to write better emails faster. Sales teams who use Lavender report they’re able to write emails in less time and receive twice as many replies. The tool uses generative AI to help compose emails. It personalizes introductions for each recipient and scores each email as it is being written to identify anything that hurts the chances of a reply. Response rates are tracked so that teams can monitor progress and continuously improve performance using data-backed insights. OpenAI’s GPT LLMs along with ChatGPT collaboratively generate email copy with the user. The output is then analyzed and scored through a complex set of business logic layers built by Lavender’s data science team, which yield industry-leading, high-quality emails. Together, the custom and generative models help write subject lines, remove jargon and fix grammar, simplify unwieldy sentences, and optimize formatting for mobile devices. They can also retrieve the recipient’s (and their company’s) latest publicly posted information to help personalize and enrich outreach. MongoDB Atlas running on Google Cloud backs the platform. Lavender’s engineers selected MongoDB because of the flexibility of its document data model. They can add fields on-demand without lengthy schema migrations and can store data of any structure. This includes structured data such as user profiles and response tracking metrics through to semi and unstructured email copy and associated ML-generated scores. The team is now exploring Atlas Vector Search to further augment LLM outputs by retrieving similar emails that have performed well. Storing, syncing, and querying vector embeddings right alongside application data will help the company’s engineers build new features faster while reducing technology sprawl. What's next? We have more places left in our AI Innovators Program , but they are filling up fast, so sign up directly on the program’s web page. We are accepting applications from a diverse range of AI use cases. To get a flavor of that diversity, take a look at our blog post announcing the first program qualifiers who are building AI with MongoDB . You’ll see use cases that take AI to the network edge for computer vision and Augmented Reality (AR), risk modeling for public safety, and predictive maintenance paired with Question-Answering systems for maritime operators. Also, check out our MongoDB for Artificial Intelligence resources page for the latest best practices that get you started in turning your idea into AI-driven reality.

August 29, 2023

Building AI with MongoDB: Announcing the First Qualifiers for the Innovators Program

Artificial Intelligence is igniting so many brilliant ideas for new products and services. But turning those ideas into reality is a path that even the brightest minds struggle to navigate without some help along the way. That’s why we launched the MongoDB AI Innovators Program back in June this year. Access to expert technical advice, free MongoDB Atlas credits, co-marketing opportunities, and – for eligible startups, introductions to potential venture investors – come together to help you “build the next big thing” in AI. Since opening the program, we’ve received applications from around the world addressing every industry and spanning the spectrum of generative to analytical AI use cases. From enterprise chat and video bots that improve customer service and unlock insights from vast internal information repositories, conversational intelligence for sales reps, AI agents for workflow orchestration, tools for talent recruitment and retention, identifying workplace burnout through to news classifiers and summarization, personal wellbeing assistants, and the generation of bedtime stories for children that make science and technology more accessible to them. We’ve been amazed by the breadth and pace of innovators building AI on top of MongoDB Atlas. In this blog post, I want to share an overview of three startups that have just qualified from our AI Innovators Program. Elevating the edge experience: Deploy AI anywhere with Cloneable and MongoDB Cloneable provides the application layer that brings AI to any device at the edge of the network. The Cloneable platform empowers developers to craft dynamic applications using intuitive low/no-code tools, instantly deployable to a spectrum of devices - mobiles, IoT devices, robots, and beyond. By harnessing machine learning models, a business can seamlessly leverage complex technologies across its operations. Models are pushed down to the device where they are converted to a native embedded format such as CoreML. From here, they are executed by the device’s neural engine to provide low latency inference, computer vision, and augmented reality. Cloneable uses MongoDB Atlas Device Sync to persist data locally on the device and sync it to the Atlas database backend in the cloud. This unique synergy creates an ecosystem where enterprise apps become real-time gateways to track, measure, inspect, and respond to events across an operation. The company is also exploring creating vector embeddings from images and data collected on devices and storing them in Atlas Vector Search . With this expanded functionality, users can better search and analyze events collected from the field. Predicting risks to public safety ExTrac draws on thousands of data sources identified by domain experts, using AI-powered analytics to locate, track and forecast both digital and physical risks to public safety in real-time. Initially serving Western governments to predict risks of emerging or escalating conflicts overseas, ExTrac is expanding into enterprise use cases for reputational management, operational risk, and content moderation. “Data is at the core of what we do. Our domain experts find and curate relevant streams of data, and then we use AI to anonymize and make sense of it at scale”, said Matt King, CEO at ExTrac. “We take a base model, such as RoBERTa or an LLM, and fine-tune it with our own labeled data to create domain-specific models capable of identifying and classifying threats in real-time.” Asked about why ExTrac built on MongoDB Atlas, King said “The flexibility of the document data model allows us to land, index, and analyze data of any shape and structure – no matter how complex. This helps us unlock instant insights for our customers.” King went on to say “ Atlas Vector Search is also proving to be incredibly powerful across a range of tasks where we use the results of the search to augment our LLMs and reduce hallucinations. We can store vector embeddings right alongside the source data in a single system, enabling our developers to build new features way faster than if they had to bolt-on a standalone vector database - many of which limit the amount of data that can be returned if it has meta-data attached to it. We are also moving beyond text to vectorize images and videos from our archives dating back over a decade. Being able to query and analyze data in any modality will help us to better model trends, track evolving narratives, and predict risk for our customers.” Access to technical expertise provided by the AI Innovators Program will help ExTrac manage the ever-growing size of its data sets – keeping performance high and costs low as the business scales. Using AI to cut maritime emissions and risk CetoAI provides predictive analytics for the maritime industry; combining high-frequency data, engineering expertise, and artificial intelligence the company reduces machinery breakdowns, cuts carbon emissions, and manages operational risk. Sensors installed onto each vessel generate real-time data feeds of engine and vessel performance. The data is used by the company’s AI models for predictive maintenance, optimizing fuel consumption, and carbon intensity forecasting, with the outputs consumed by the vessel’s crew, owners, and insurers. The data feeds generated by CetoAI are highly complex. Sensors on each vessel emit around 90,000 JSON documents daily. Each document stores around 100 unique time-series measurements, all requiring heavy-duty analytics processing before feeding machine learning models. It was these demands that led CetoAI’s engineering team to select MongoDB, migrating from a standalone time-series database that couldn’t keep pace with business growth. Sensor measurements from each data feed are streamed through Microsoft Azure’s IoT hub and ingested into MongoDB Atlas’ purpose-built time-series database collections . Here MongoDB window functions process and transform the data before serving it to CetoAI’s machine-learning models built with PyTorch and Scikit-Learn. CetoAI is now exploring additional capabilities available with MongoDB to expand its offerings. Atlas Device Sync can persist data locally and sync it between vessels and the cloud, withstanding the loss of network connectivity. Atlas Vector Search can be used for Retrieval Augmented Generation with the company’s LLMs. These are being developed to help crews diagnose and remediate equipment failures using natural language queries. Access to Atlas credits and expert support provided as part of the AI Innovators program enable CetoAI to accelerate and derisk the delivery of these new services. Get started Today we’ve focused on just three startups – there are many more that are already enjoying the benefits of the AI Innovators Program . We have more places left, but they are filling up fast, so sign up directly on the program’s web page. Also, check out our MongoDB for Artificial Intelligence resources page for all of the latest best practices to get you started on building the “next big thing” with AI.

August 16, 2023