I’m having a similar problem but with python. The index mentioned here helped.
Relevant parts of my model example:
class Question(Document):
content = StringField(required=True)
...
class Theme(Document):
question = ReferenceField(Question, required=True)
text = StringField(required=True) # AKA Category title
embedding = ListField(FloatField())
So, basically, Question is a reference field of Theme.
I want to run a semantic search on theme collection, but want to PREFILTER by question.
Here is my index for theme:
{
"mappings": {
"fields": {
"embedding": [
{
"dimensions": 1536,
"similarity": "cosine",
"type": "knnVector"
}
],
"question": {
"type": "token"
}
}
}
}
And here’s my simplified python code:
from langchain.vectorstores import MongoDBAtlasVectorSearch
from langchain_openai import OpenAIEmbeddings
from pymongo import MongoClient
from bson import ObjectId
import os, json
# Set environment variables
os.environ['OPENAI_API_KEY'] = ""
os.environ["MONGODB_HOST"] = ""
# Connect to the MongoDB database
mongo_client = MongoClient(os.environ["MONGODB_HOST"])['text-mining-langchain']
embeddings = OpenAIEmbeddings()
# Get the collection
collection = mongo_client['theme']
# Filter the documents based on the 'question' field
question_id = ObjectId('663252400674de6854bf6594')
pre_filter_dict = {"question": str(question_id)}
vectorstore = MongoDBAtlasVectorSearch(collection, embeddings, text_key="text",
embedding_key="embedding", index_name="default")
# Perform similarity search on the filtered documents
query = 'Ease of Use and Accuracy'
docs = vectorstore.similarity_search_with_score(query, k=10, pre_filter=pre_filter_dict)
docs
When I run it WITH an empty pre_filter_dict, I get results. When I try pre filter first, I get zero results, even though I should have some results.
Can anyone help with this?