Gen AI-Powered Video Summarization

Build a YouTube transcription and summarization service with a large language model (LLM) and semantic search.

Use cases: Gen AI

Industries: Media

Products: MongoDB Atlas, MongoDB Atlas Vector Search

Partners: LangChain

Solution Overview

With the amount and variation of informational content on platforms such as YouTube, being able to quickly find relevant videos and transcribe and summarize them is important for knowledge gathering.

This solution builds a Generative AI-powered video summarization app for transcribing and summarizing YouTube videos. The application uses an LLM and vector embeddings with Atlas Vector Search for video-to-text generation and semantic searches. This approach can assist industries such as software development, where professionals can learn technologies faster with Gen AI video summarization.

Reference Architectures

Without MongoDB, a video summarization tool uses the following workflow:

Figure 1. Reference architecture without MongoDB

This solution uses the following architecture with MongoDB:

Figure 1. Reference architecture with MongoDB

First, the solution uses YouTubeLoader to process YouTube links and get video metadata and transcripts. Then, a Python script fetches and summarizes the video transcript using an LLM.

Voyage AI embeddings models then convert the summarized transcripts into embeddings that are stored in MongoDB Atlas. Additionally, Optical Character Recognition (OCR) and AI perform real-time code analysis directly from video frames, generating a searchable, text-based version of the video information, along with an AI-powered explanation.

The solution stores this processed data in documents in MongoDB Atlas that include the video metadata, its transcript, and AI-generated summary. The user can then search these documents by using MongoDB Atlas Vector Search.

Data Model Approach

The following code block is an example of the documents generated by this solution:

{
  "videoURL": "https://youtu.be/exampleID",
  "metadata":{
    "title": "How to use GO with MongoDB",
    "author": "MongoDB",
    "publishDate": "2023-01-24",
    "viewCount": 1449,
    "length": "1533s",
    "thumbnail": "https://exmpl.com/thumb.jpg"
  },
  "transcript": "Full transcript…",
  "summary": "Tutorial on using Go with MongoDB.",
  "codeAnalysis": [
    "Main function in Go initializes the MongoDB client.",
    "Imports AWS Lambda package for serverless architecture."
  ]
}

The data extracted from each YouTube video consists of the following:

videoURL: A direct link to the YouTube video.
metadata: Video details such as title, uploader and date.
transcript: A textual representation of the spoken content in the video.
summary: A concise, AI-generated version of the transcript.
codeAnalysis: A list of AI-analyzed code examples.

Build the Solution

The code for this solution is available in the GitHub repository. Follow the README for more specific instructions that walk you through the following procedure:

Set up the environment

Install all required libraries, including LangChain, JSON, pymongo, and any other domain or service-specific libraries.

Configure and start the application

Add your OpenAI API key and MongoDB Atlas connection details to a config.ini file. Then, run the following command to generate the transcription for a given YouTube video:

python main.py --youtube https://www.youtube.com/watch?v=sample_id

Load YouTube videos

The YouTube loader fetches metadata and the transcript. If the summarization process encounters an error, it's caught and stored in the summary field for that particular video.

Store the data in MongoDB Atlas

The compiled data, including the video summary, is serialized into a JSON format and saved to individual files named video_transcript_<index>.json.

Create a MongoDB Atlas Vector Search index

Convert the summarized transcript into embeddings for Vector Search and store these in MongoDB Atlas.

To learn how to use Atlas Vector Search and create an index, see MongoDB Vector Search Quick Start.

The following figure shows parameter values you can use when creating your Vector Search index.

Figure 3. Storing data in MongoDB Atlas with Vector Search

Create an orchestration layer

The solution uses an orchestration layer to coordinate the solution's various services and manage complex workflows. The orchestration layer is composed of the following classes, which you can find in the solution's GitHub repository:

VideoServiceFacade: Acts as the coordinator for the VideoService, SearchService, and VideoProcessResult classes. This system handles user prompts and requests for transcript generation and summarization.
VideoService: Performs transcript summarization.
VideoProcessResult: Encapsulates the processed video results, including metadata, possible actions, and optimal search query terms.
SearchService: Performs a search in MongoDB Atlas.

Search for a phrase

Run the following command to query the vectorized information:

python main.py --searchFor "your_search_query_here"

Key Learnings

Atlas Vector Search enables natural language search: This solution creates and stores vector indexes in Atlas Vector Search, and stores LLM-generated embeddings and outputs in MongoDB Atlas. This enables users to search one platform for relevant, previously-unstructured information that may not have exact keyword matches.
LangChain facilitates Gen AI-powered applications: LangChain seamlessly integrates with MongoDB to create a powerful AI-driven platform.

Authors

Fabio Falavinha, MongoDB
David Macias, MongoDB

Learn More

Back

Editorial Workflows with Gen AI

Semantic Video Search