knnBeta
On this page
Definition
knnBeta
The
knnBeta
operator uses Hierarchical Navigable Small Worlds algorithm to perform semantic search. You can use Atlas Search support for kNN query to search similar to a selected product, search for images, etc.
Syntax
knnBeta
has the following syntax:
1 { 2 $search: { 3 "index": "<index name>", // optional, defaults to "default" 4 "knnBeta": { 5 "vector": [<array-of-numbers>], 6 "path": "<field-to-search>", 7 "filter": {<filter-specification>}, 8 "k": <number>, 9 "score": {<options>} 10 } 11 } 12 }
Options
Field | Type | Description | Necessity |
---|---|---|---|
filter | document | Any Atlas Search operator to filter the
documents based on metadata or certain search criteria, which
can help narrow down the scope of vector search. | Optional |
k | number | Number of nearest neighbors to return. You can specify a number
higher than the number of documents to return
( $limit ) to increase accuracy. | Required |
path | string | Indexed knnVector type field
to search. See Path Construction for more
information. | Required |
score | document | Score assigned to matching documents in the results. To learn
more, see scoring behavior. | Optional |
vector | array of numbers | Array of numbers of BSON types int or double that
represent the query vector. The array size must match the number
of vector dimensions specified in the index for the field. | Required |
Behavior
You can run kNN queries against fields that were indexed as Atlas Search type knnVector only.
You can use $limit
after the $search
stage
to limit the number of documents in the knnBeta
query results. We
recommend setting the value for k
higher than the value for
$limit
. This overrequest pattern is the main way to trade off latency and recall in your
approximate nearest neighbor searches. Empirically, we have seen a multiplier of 5-10 work
well for many use cases, but we recommend tuning this on your specific dataset.
Example
The following query finds 150
nearest neighbors to the query and
limits the remaining number of results to 50
.
1 db.<collection>.aggregate({ 2 "$search": { 3 "knnBeta": { 4 "vector": <array-of-numbers-to-search>, 5 "path": <indexed-field-to-search>, 6 "k": 150 7 } 8 } 9 }, 10 { 11 "$limit": 50 12 })
Performance
To improve query performance, use the $project
stage to
select the fields to return in the results, unless you need all the
fields in the results. We recommend excluding the vector field in
the $project
stage.
Scoring
You can use the score
field with the $meta expression searchScore
in
the $project
stage to return the score for the documents in
the results.
Atlas Search scores the results for kNN queries
in a fixed range from 0
to 1
only. For cosine
and
dotProduct
similarities, Atlas Search
normalizes the score using the following algorithm:
score = (1 + cosine/dot_product(v1,v2)) / 2
Limitations
knnBeta
operator must be the top-level operator in your queries and
therefore, you can't use the knnBeta
operator inside the following:
embeddedDocument operator
compound operator
facet collector
You can't use the $search
sort option with
the knnBeta
operator.
We don't recommend paginating your Atlas Search results using $skip
and $limit
after the $search
stage.
Examples
The following queries search the sample sample_mflix.embedded_movies
collection using the knnBeta
operator. The queries search the
plot_embedding
field, which contains embeddings created using OpenAI's
ada-002-text
embeddings service. If you added the sample
collection to your Atlas cluster and
created the sample index definition for the
collection, you can switch to the sample_mflix
database and run the
following queries against the collection.