Use cases: Content Management, Gen AI
Industries: Media, Telecommunications
Products and Tools: MongoDB Atlas, MongoDB Atlas Vector Search
Partners: Amazon Bedrock
Solution Overview
Content teams face increasing pressure to produce engaging and credible content in a fast-paced news environment. Traditional methods often lead to creative fatigue and missed opportunities rather than content creation, due to time spent on manual research, source verification, and tool management. With MongoDB, you can combine Generative AI with MongoDB's adaptable data infrastructure to optimize editorial operations. To help you test these capabilities, we provide you with the Content Lab demo, a solution you can replicate.
The Content Lab demo streamlines editorial workflows and allows you to:
Ingest and structure diverse content: This demo efficiently processes high volumes of unstructured and semi-structured content from various sources, dynamically organizing it by topic, industry, and source metadata.
Enable AI-powered discovery and drafting: Embedding models and MongoDB Atlas Vector Search transform raw content into structured, searchable data. This combination enables semantic retrieval of trending topics and automates content drafting, reducing creative fatigue.
Enhance content credibility: This demo captures and stores source URLs, which are then embedded directly into topic suggestions. Integration with external search agents further enriches content suggestions with contextual information.
Facilitate personalization and boost workflow efficiency: This demo processes the user's profile to deliver personalized writing suggestions and stores drafts for version control and reuse. MongoDB’s flexible schema makes this possible by adapting effortlessly to evolving profile data, draft formats, and new content types without disrupting the workflow.
Figure 1. User journey flow diagram
By providing a unified storage solution, real-time insights, and automated content assistance, this demo shows how MongoDB helps editorial teams reduce complexity, enhance content quality, and accelerate production. It offers publishers a clear path from idea to publication.
Reference Architecture
The Content Lab demo provides an AI-driven publishing tool that combines Gen AI with MongoDB's flexible data infrastructure to streamline editorial operations. The architecture is designed as a microservice to:
Handle diverse content ingestion
Drive AI-powered discovery and drafting
Enhance content credibility
Support personalization and workflow efficiency
Figure 2. High-level architecture of the Content Lab demo
This architecture uses the following components:
User interface (UI): Users interact with the system through a UI that provides features like topic suggestions, drafting tools, and draft management.
Backend services: These microservices handle different functions of the demo, including:
Content analysis and suggestions backend: This service processes news and Reddit data, transforming content into semantic vectors through embedding models like Cohere-embed. These vectors can then be processed with Atlas Vector Search to provide real-time topic suggestions. The microservice has these major components:
Scheduler and orchestration: This service automates ingestion, embedding generation, and topic suggestion workflows daily.
Role: This service powers downstream writing assitance and personalization to the writing assistant microservice using semantic search and retrieval.
Below you can find a high-level overview diagram of this microservice.
click to enlargeFigure 3. High-level architecture of the content and suggestions backend
Writing assistant backend: This service provides tools for publishing, which include draft outlining, proofreading, content refinement, and chat completion. These tools use LLMs such as Anthropic Claude via AWS Bedrock.
MongoDB Atlas: Atlas serves as the primary data store, providing semantic search capabilities, database storage, and aggregation pipelines for efficient processing and retrieval.
Data Model Approach
This demo uses the following document model design and collections to store content.
There are five main collections in the Content Lab demo:
userProfilereddit_postsnewssuggestionsdrafts
The userProfile collection stores individual user preferences to
tailor personalized AI-driven suggestions. These preferences include:
persona: The type of writer the user can choose.tone: The desired tone the user can choose, for example, casual, formal or semi-formal.styleTraits: The predefined characteristics of the writer.sampleText: An example sentence from a writer.
This schema follows the MongoDB design principle that data frequently accessed together is stored together, enabling the writing assistant to quickly retrieve user recommendations. A sample document is shown below.
{ "_id": { "$oid": "6862a8988c0f7bf43af995a8" }, "persona": "The Formal Expert", "userName": "Mark S.", "tone": "Polished, academic, appeals to professionals and older readers", "styleTraits": [ "Long, structured paragraphs", "Formal language with rich vocabulary", "Analytical, often includes references or citations" ], "sampleText": "This development represents..." }
The reddit_posts and news collections store raw data ingested
from their respective APIs. These documents are further enriched with
embeddings, which are numerical representations of the content's meaning
that enable semantic search.
The suggestions collection contains the topics suggested from the
processed reddit_posts and news data. The UI can easily find
these documents and use them for topic selection. A sample document is
shown below.
{ "_id": { "$oid": "686fb23055303796c4f37b7e" }, "topic": "Backlash against generative AI", "keywords": [ "algorithmic bias", "data privacy", "AI regulation", "public trust" ], "description": "As generative AI tools like ChatGPT proliferate, a growing public backlash highlights concerns over their negative impacts and the need for stronger oversight.", "label": "technology", "url": "https://www.wired.com/story/generative-ai-backlash/", "type": "news_analysis", "analyzed_at": { "$date": "2025-07-10T12:29:36.277Z" }, "source_query": "Viral social media content" }
Finally, the drafts collection stores users’ drafts. Each draft is
associated with a suggested topic, allowing for easy organization and
retrieval. This model ensures persistence, version control, and content
reusability for editorial workflows.
Build the Solution
You can replicate this demo by following these steps:
Fork and clone repositories
Fork and clone the backend #1, backend #2, and frontend repos to your GitHub account.
Provision MongoDB Atlas
Within your MongoDB Atlas account, create a cluster
and a database named contentlab with these collections:
drafts: Store user-created draft documentsnews: Store scraped news articles with embeddings.reddit_posts: Store Reddit posts and comments with embeddings.suggestions: Store AI-generated topic suggestions.userProfiles: Store user profile information and preferences.
Install dependencies and run services
Install and start both backend services on ports 8000 and 8001. Then, install frontend dependencies and launch the dev server at http://localhost:3000.
Key Learnings
Adapt data models with MongoDB’s flexible schema: With MongoDB, you can seamlessly add new fields or adapt existing ones-such as custom metadata, summaries, and version histories-in your collections, without downtime or complex migrations.
Integrate Atlas Vector Search for meaningful discovery: With MongoDB, you can store embeddings from various APIs in their respective collections and then run similarity queries to uncover relevant topics in seconds.
Ensure editorial trust by tracking content sources: With MongoDB, you can store source URLs and metadata alongside suggestions, making it easy to verify origins and preserving credibility in drafts.
Maintain a constant stream of ideas by automating your pipeline: With MongoDB, you can schedule daily jobs to scrape news, process embeddings, and generate suggestions that guarantee up-to-date topic recommendations.
Authors
Aswin Subramanian Maheswaran, MongoDB
Felipe Trejos, MongoDB
Learn More
To understand how Atlas Vector Search powers semantic search and enables real-time analytics, visit the Atlas Vector Search page.
To learn how MongoDB is transforming media operations, read the AI-Powered Media Personalization: MongoDB and Vector Search article.
To discover how MongoDB supports modern media workflows, visit the MongoDB for Media and Entertainment page.