Adding a metadata to Mongodb vector database

Hello,

I am using Mongodb Vector database with LangChain. I would like to add a metadata to each documents
and use the metadata to filter the results.
Can someone guide me?

loader = WebBaseLoader(
            [ " http://mongodb.com  "
            
            ]
            )

        data = loader.load()
        text_splitter = RecursiveCharacterTextSplitter(chunk_size=4000, chunk_overlap=500)
        docs = text_splitter.split_documents(data)
        metadata = {"user-id": "your-user-id"}  
        # Add Metadata to all docs here 
        client = MongoClient(self.config.mongodb_uri)
        MONGODB_COLLECTION = client[self.config.vector_db_name][self.config.collection_name]
        MongoDBAtlasVectorSearch.from_documents(
            documents=docs,
            embedding=OpenAIEmbeddings(disallowed_special=()),
            collection=MONGODB_COLLECTION,
            index_name=self.config.search_index_name,
            metadata=metadata 
        )

And in retrieval

# Add pre-filter here.

 vector_search = MongoDBAtlasVectorSearch.from_connection_string(
            self.config.mongodb_uri,
            self.config.vector_db_name  + "." + self.config.collection_name,
            OpenAIEmbeddings(disallowed_special=()),
            index_name=self.config.search_index_name,
        )
        retriever = vector_search.as_retriever()

Hello Meera,

Thanks for question. You can absolutely filter on metadata using Atlas Vector Search. The way you do this is by defining additional fields from your document that you’d like to filter on in the index.

This documentation shows how to setup that index and query with filters in the “Filter” example: https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/#examples

And, if you’re using Langchain, the documents here on Langchain also show how to use the filter in the Langchain syntax: MongoDB Atlas | 🦜️🔗 Langchain

Thanks!
I am working with Langchain, and the resource you provided worked for filtering the results for retrieval.

Followup question is:

How do I populate the vector database with custom metadata field ?
This is how I am adding the metadata

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
docs = text_splitter.split_documents(data)

# Help me find a better way than iterating over all the documents
for i, doc in enumerate(docs): 
     doc.metadata["user_id"] = user_id

MongoDBAtlasVectorSearch.from_documents(
            documents=docs,
            embedding=OpenAIEmbeddings(disallowed_special=()),
            collection=MONGODB_COLLECTION,
            index_name=self.config.search_index_name,
        )

Now for the retriever

docs = text_splitter.split_documents([data], metadatas = [{'"user_id"' :  user_id}] )

The above should do the job

Refer Split by character | 🦜️🔗 Langchain (Look for the metadata section)

1 Like