How can I load PDFs into Atlas with Vector Search in Mind?

Hello,

I have a lot of unstructured PDF, podcasts, Webcast data that I’d like to add in Atlas with vector search enabled.

Please share an article/sample to do this preferrable in .NET or CLI.

Thanks,
Armen

Are you looking to build an application in .NET or is it for learning purpose? What’s your familiarity with Python. The reason I ask is that there are multiple framework which enable you to do this easily than doing this from scratch (which is possible)

Thanks for reply. I have a PoC in mind which involves MongoDB as the backend to use a chatbot on top of it for fast search. Python is fine but C# is preferred.

I am looking to incorporate the PoC in M365. Is there a way to do a quick prototyping before getting into the details?

Chatbots using Vector seach are implemented using a pattern called RAG.

There are couple of resources:

  1. Conceptual ideas:
    MongoDB has open sourced a chatbot framework: Taking RAG to Production with the MongoDB Documentation AI Chatbot | MongoDB
    RAG with Atlas Vector Search, LangChain, and OpenAI | MongoDB

  2. Microsoft Semantic kernel - This a dev framework that allows building RAG applications and has a mongoDB integration in C# and Python version. Here’s the C# version of integration https://github.com/microsoft/semantic-kernel/tree/main/dotnet/src/Connectors/Connectors.Memory.MongoDB. Here is a tutorial in python for performing RAG - Building AI Applications with Microsoft Semantic Kernel and MongoDB Atlas Vector Search | MongoDB

  3. RAG 101 - PDF to Chatbot using LangChain in python: https://github.com/prakul/MongoDB-AI-Resources/blob/main/Langchain_%2B_MongoDB_101.ipynb

You can implement these steps in C# and can find the C# code samples for using the $vectorsearch on docs page - https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/
(Choose your language on top right)

2 Likes