I am trying to build a simple RAG using LangChain that will answer some basic questions regarding a list of authors and their books but they should be split based on author_gender, if I select ''female" I should have only the answers only from books that were written by female authors and the same for “male”.
Data looks something like this:
{"text":"{"title": "Pride and Prejudice", "author": "Jane Austen"}",
"author_gender":"female",
"publication_year":{"1813"}}
Below Is the code that I am using, is a simple filter within the retriever but it looks like it does absolutely nothing, results are the same using a filter or not within the retriever.
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=vector_search.as_retriever(filter={"author_gender": "female"}))
question = "List novels written by female authors"
result = qa({"query": question})
print(result["result"])
Based on the provided context, the novels written by male authors are:
- "The Great Gatsby" by F. Scott Fitzgerald
- "To Kill a Mockingbird" by Harper Lee
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=vector_search.as_retriever())
question = "List novels written by male authors"
result = qa({"query": question})
print(result["result"])
Based on the provided context, the novels written by male authors are:
- "The Great Gatsby" by F. Scott Fitzgerald
- "To Kill a Mockingbird" by Harper Lee
Is there any way to have a stable way of filtering the results based on the available metadata information?
Library used:
langchain==0.2.6
pymongo==4.8.0
System version: 3.10