I’m using sample_mflix.movies collection from Sample Mflix Dataset with knnBeta and knnVector index. First, I followed this tutorial of semantic search, and as a next step, I want to filter the movies by genres array field, before doing the semantic search.
(In the following queries {{queryEmbedding}} is the embedding array - It’s environment variable in Postman.)
The regular vector search works fine:
"collection": "movies",
"database": "sample_mflix",
"dataSource": "Cluster0",
"pipeline": [
{
"$search": {
"index": "vector_01",
"knnBeta": {
"vector": {{queryEmbedding}},
"path": "plot_embedding",
"k": 5
}
}
},
{
"$set": {
"score": {
"$meta": "searchScore"
}
}
},
{
"$project": {
"embedding": 0
}
}
]
}
Some of the documents have Comedy in genres array.
When I try to filter on genres field, I get an empty result. This query doesn’t work, and I don’t know why:
{
"collection": "movies",
"database": "sample_mflix",
"dataSource": "Cluster0",
"pipeline": [
{
"$search": {
"index": "vector_01",
"knnBeta": {
"vector": {{queryEmbedding}},
"path": "plot_embedding",
"k": 5,
"filter": {
"in": {
"path": "genres",
"value": [
"Comedy"
]
}
}
}
}
},
{
"$set": {
"score": {
"$meta": "searchScore"
}
}
},
{
"$project": {
"embedding": 0
}
}
]
}
Interestingly, the text filter works so I can filter on rated field:
{
"collection": "movies",
"database": "sample_mflix",
"dataSource": "Cluster0",
"pipeline": [
{
"$search": {
"index": "vector_01",
"knnBeta": {
"vector": {{queryEmbedding}},
"path": "plot_embedding",
"k": 5,
"filter": {
"text": {
"path": "rated",
"query": "PASSED"
}
}
}
}
},
{
"$set": {
"score": {
"$meta": "searchScore"
}
}
},
{
"$project": {
"embedding": 0
}
}
]
}
Additionally, I tried modifying the index to include the genres field. This is the definition of vector_01 index:
{
"mappings": {
"dynamic": true,
"fields": {
"genres": {
"analyzer": "lucene.keyword",
"type": "string"
},
"plot_embedding": {
"dimensions": 1536,
"similarity": "cosine",
"type": "knnVector"
}
}
},
"storedSource": {
"include": [
"title",
"plot",
"genres"
]
}
}
I tried with a single filter as in the examples above and with must in a compound filter. The results are the same.
How can I filter on arrays while using knnBeta at the same time?