Using MongoDBLoader for Mistral 7B LLM model


I’ve been working lately with MongoDB VectorStore and MongoDb Atlas Search Index for storing data for my LLM model Mistral 7B.

I’ve been loading simple data files like small *txt files or PDFs. However, my main approach is to provide my LLM with a MongoDB database to ask questions about it.

So, I have tried with the Langchain MongoDBLoader but I did not receive the results I expected.

First of all, am I loading correctly the database? Do I have to make changes on my search_index? I beleive the error is in the retriever, but I just don’t know how to fix it, Is there any other method to create a retriever?

Thank you guys.

Here is the loader code:

client = pymongo.MongoClient("mongodb+srv://")
dbName = "LLM2"
collectionName = "Mistral2"
collection = client[dbName][collectionName]

loader = MongodbLoader(
    db_name = "sample_restaurants",
    filter_criteria={"borough": "Bronx", "cuisine": "Bakery"}

doc = loader.load()

splitter = RecursiveCharacterTextSplitter(
    chunk_size = 300,
    chunk_overlap = 50,
data = splitter.split_documents(doc)

embeddings = HuggingFaceBgeEmbeddings(
    model_name = "sentence-transformers/paraphrase-MiniLM-L6-v2",

vectorStore = MongoDBAtlasVectorSearch.from_documents( data, embeddings, collection=collection, index_name = "Model" )

When I check on Compass if all the data has been uploaded, everything looks fine. So I guess the problem is not here.

I’m using the following search_index

  "fields": [
      "numDimensions": 384,
      "path": "embedding",
      "similarity": "cosine",
      "type": "vector"

Then, I applied the standard RAG architecture:

def query_data(query):   

    docs = vectorStore.similarity_search(query, top_k=1)
    as_output = docs[0].page_content

    llm = CTransformers(model = "./mistral-7b-instruct-v0.1.Q4_0.gguf",
                        model_type = "llama",
                        #config = {'max_new_tokens': 400, 'temperature': 0.01}

    retriever = vectorStore.as_retriever()

    QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

    qa = RetrievalQA.from_chain_type(llm, chain_type="stuff", retriever=retriever, chain_type_kwargs = {'prompt': QA_CHAIN_PROMPT})

    retriever_output = qa.invoke(query)

    return as_output, retriever_output

But when I ask my model about how many restaurants does he have information about, he answers me with only 4 restaurants and they are never the same ones. The filter criteria involves 70 restaurants.

The same happens when I ask specific information about one restaurant: It returns me wrong data or it just tells me it does not have information about that restaurant, when it should have it.

what if I want the document id to be in the metadata too?