BLOGAtlas Vector Search voted most loved vector database in 2024 Retool State of AI report — Read more >>


AI-driven media personalization: MongoDB and Vector Search

Discover how to leverage MongoDB and Vector Search effectively to create more engaging, personalized user experiences.
Start Free
Illustration of a credit card payment transaction.
Solution overview

In today’s rapidly evolving media landscape, publishers face unprecedented challenges. The surge in digital content has saturated the market, making it increasingly difficult to capture and retain audience attention. Furthermore, the decline in referral traffic, primarily from social media platforms and search engines, has put significant pressure on traditional media outlets. Publishers are seeking ways to stabilize their user base and enhance engagement in a sustainable way.

At MongoDB, we understand that the key to overcoming these challenges lies in leveraging data effectively to create more engaging, personalized user experiences. Our solution, specifically designed for large-scale media and publishing companies, harnesses the power of MongoDB and Atlas Vector Search to transform how content is delivered to users.

The essence of our approach is to deeply understand the user. By analyzing interactions and consumption patterns, the solution not only grasps what content resonates but also predicts what users are likely to engage with in the future. This insight enables publishers to construct a highly personalized content journey. To achieve this, we integrate several advanced capabilities.

Content suggestions and personalization

By utilizing user data, behavior analytics, and the multi-dimensional vectorization of media items, our platform suggests content that aligns with individual preferences and past interactions. This not only enhances user engagement but also increases the likelihood of converting free users into paying subscribers. By integrating MongoDB's vector search to perform k-nearest neighbor (k-NN) searches, we streamline and optimize how content is matched. Vectors are embedded directly in MongoDB documents. This has several advantages. For instance, there are no complexities of a polyglot persistence architecture. No need to extract, transform, and load (ETL) data between different database systems, which simplifies the data architecture and reduces overhead. Furthermore, MongoDB’s built-in scalability and resilience become particularly advantageous when handling vector search operations. Organizations can scale their operations vertically or horizontally, and they can even choose to scale search nodes independently from operational database nodes, thus adapting to the specific load scenario.

Content summarization and reformatting

In an age of information overload, our solution provides concise summaries and adapts content formats based on user preferences and device specifications. This tailored approach addresses the diverse consumption habits of users across different platforms.

Keyword extraction

Essential information is drawn from content through advanced keyword extraction, enabling users to grasp key news dimensions quickly and enhancing the searchability of content within the platform. Keywords are fundamental to how content is indexed and found in search engines, and they significantly influence the SEO (search engine optimization) performance of digital content. In traditional publishing workflows, selecting these keywords can be a highly manual and labor-intensive task, requiring content creators to identify and incorporate relevant keywords meticulously. This process is not only time-consuming but also prone to human error, with significant keywords often overlooked or underutilized, which can diminish the content's visibility and engagement. With the help of the underlying LLM, our solution extracts keywords automatically and with high sophistication.

Automatic creation of insights and dossiers

Our system can automatically generate comprehensive insights and dossiers from multiple articles. This feature is particularly valuable for users interested in deep dives into specific topics or events, providing them with a rich, contextual experience. This capability leverages the power of one or more large language models (LLMs) to generate natural language output, enhancing the richness and accessibility of information derived from across multiple source articles. This process is agnostic to the specific LLMs used, providing flexibility and adaptability to integrate with any leading language model that fits the publisher's requirements. Whether the publisher chooses to employ more widely recognized models such as OpenAI's GPT series, or other emerging technologies, our solution seamlessly incorporates these tools to synthesize and summarize vast amounts of data. Here’s a deeper look at how this works:

  • Integration with Multiple Sources: The system pulls content from a variety of articles and data sources, retrieved with MongoDB Atlas Vector Search. Found items are then compiled into dossiers, which provide users with a detailed and contextual exploration of topics, curated to offer a narrative or analytical perspective that adds value beyond the original content.

  • Customizable Output: The output is highly customizable. Publishers can set parameters based on their audience’s preferences or specific project requirements. This includes adjusting the level of detail, the use of technical versus layman terms, and the inclusion of multimedia elements to complement the text.

This feature significantly enhances user engagement by delivering highly personalized and context-rich content. It caters to users looking for quick summaries as well as those seeking in-depth analyses, thereby broadening the appeal of the platform and encouraging deeper interaction with the content. By using LLMs to automate these processes, publishers can maintain a high level of productivity and innovation in content creation, ensuring they remain at the cutting edge of media technology.

Other applicable industries and use cases

The core concepts of the solution above can be reused equally across other industries, namely retail, where presenting and matching the right product to the right users is essential to keep sales high.

Reference architecture
Reference architecture
Demo application

We have developed a showcase of the solution. It is available at and incorporates the concepts discussed above.

IST Media image

The underlying data model is straightforward; a representative news article looks like this:

representative news article

Embeddings are calculated using the OpenAI model text-embedding-ada–002. A Vector Index has been created from the MongoDB Atlas web interface like this:

media demo business news
Technologies and products used
MongoDB developer data platform
  • Benjamin Lorenz, MongoDB
Related resources

GitHub Repository: News Demo

Create a local version of this demo by following the instructions in the repository.


White Paper: AI-Powered Innovation in Telecommunications and Media

Learn how leading telco and media organizations are leveraging AI technology to build innovative solutions.


E-book: MongoDB for Telecommunications

Learn how MongoDB can support the telecommunications industry.

Get started with Atlas

Get started in seconds. Our free clusters come with 512 MB of storage so you can experiment with sample data and get familiar with our platform.
Try FreeContact sales
Illustration of hands typing on a laptop in the foreground and a superimposed desktop window and coffee cup in the background.