How to use preFilter with langchain JS? tried many things

Inside the Atlas Cloud:
Atlas Cluster > Atlas Search > Edit Search Index ‘default’

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "agenticDocId": [
        {
          "normalizer": "lowercase",
          "type": "token"
        }
      ],
      "embedding": {
        "dimensions": 1536,
        "similarity": "cosine",
        "type": "knnVector"
      }
    }
  }
}

using MongoDBAtlasVectorStoreSearch() asRetriever

const retriever = vectorstore.asRetriever({
      callbacks: [
        {
          handleRetrieverEnd(documents) {
            resolveWithDocuments(documents);
          },
        },
      ],
      filter: {
        preFilter: {
          agenticDocId: {
            $eq: docId,
          },
        },
      },
    });

Get error:

error:  PlanExecutor error during aggregation :: caused by :: Path 'agenticDocId' needs to be indexed as token

but seems i have indexed field ‘agenticDocId’ as token.
Not seeing this field in the Atlas Collection:
But I see it is returned in the query when the filter above is removed (which causes the error). (screenshot of a collection item)
Screenshot 2024-01-30 at 4.46.07 AM

I want to filter by this field I added to the document metadata ‘agenticDocId’

I’m passing it as a string to the filter, as I’ve seen in other examples. I am setting the custom metadata field in each of the documents, after using splitter from langchain, and then calling addDocuments:

const iResponse = await mongoDBAtlasVectorSearch.addDocuments(
  docsWithMetaData
);

Please guide me how to do this prefiltering by specific document id.

1 Like

Hello @Konrad_Gnat, welcome to the forums!

I don’t see a field called “agenticDocId” in the sample document in the screenshot you included, is it cut off? I only see “text” “embedding” “source” “blobType” “pdf” and “loc”.

no it is. it

*it is not there
the field is not there

i am getting it back from the query without filter

i am using langchain to vectorize the data into mongo
it is not clear from docs how to set this custom field metadata in the collection
i assume it would add it like in the code above

So the field must be inside of your MongoDB documents when they are inserted into the database.

Is there another snippet of code that shows the document being written to the DB?

yes it is the line with addDocumebts

now the objects do have this property as i log it to console to confirm.

i haven’t found any other examples or documentation on how to insert this document with the vector, i need guidance on this. I am using langchain js.

are all the other parts correct?

I can see that I am calling addDocument with this object which has the agenticDocId field


t:

but in the collection these fields are not present. They do not get inserted into the collection. Why is that? Here is a screenshot of mongodb compass:

They are inserted into Mongodb here:
Screenshot 2024-01-31 at 3.27.46 AM

In the example doc you show being inserted that seems like a completely different doc than the one you see in the atlas console? It has “pageContent” and “metadata” whereas the doc in atlas has “text” “embedding” etc…

Is it possible they are getting inserted into a different collection?

Are you sure that snippet where you are inserting documents is inserting the right things into the database?

I would expect the langchain code to look something like this to both do the embedding and insert the documents into the DB

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import MongoDBAtlasVectorSearch

# insert the documents in MongoDB Atlas with their embedding
vector_search = MongoDBAtlasVectorSearch.from_documents(
    documents=docs,
    embedding=OpenAIEmbeddings(disallowed_special=()),
    collection=MONGODB_COLLECTION,
    index_name=ATLAS_VECTOR_SEARCH_INDEX_NAME,
)

I was able to insert the document now. The problem was I wasn’t deleting the old documents properly, and so didn’t see the new ones because they were further down.

Now I am still getting this error when trying to do preFilter

    const retriever = vectorstore.asRetriever({
      callbacks: [
        {
          handleRetrieverEnd(documents) {
            resolveWithDocuments(documents);
          },
        },
      ],
      filter: {
        preFilter: {
          agenticDocId: {
            $eq: docId,
          },
        },
      },
    });

error:


error:  PlanExecutor error during aggregation :: caused by :: Path 'agenticDocId' needs to be indexed as token

here is the search index ‘agenticDocId’:

and how it’s configured:

Please guide me on how to do preFilter. I have tried other syntax’s as well but got other kinds of errors:
like this

      preFilter: {
        text: {
          path: "agenticDocId",
          query: docId,
        },
      },

and this

      "compound": {
        "must": [
            {
                "text": {
                    "path": "agenticDocId",
                    "query": "docId"
                }
            }
        ]
    }

I tried indexing it as ‘token’ field type, but same error appears.

Hi @Konrad_Gnat , you are creating the index in ‘Atlas Search’ (which is full text search), while you should be creating one in ‘Atlas Vector Search’. this is what LangChain uses. If you click on the “Atlas Search” tab, it will show the following screen

In the Atlas Vector Search screen click on “Edit Json”:
Then enter the following definition:

{
  "fields":[
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": <number-of-dimensions>,
      "similarity": "cosine"
    },
    {
      "type": "filter",
      "path": "agenticDocID"
    },
    ...
  ]
}

you can read about Atlas vector search here