How to Index Vector Embeddings for Vector Search
On this page
You can use the knnVector
type to index vector embeddings. The
vector field can be represented as an array of numbers of the following types:
BSON
int32
,int64
, ordouble
data types for querying using the knnBeta operator.BSON
double
data type for querying using the$vectorSearch
stage.
You can use the knnBeta operator, which is now deprecated,
and the $vectorSearch
stage in your aggregation
pipeline to query fields indexed as knnVector
.
Note
You can't use the Atlas Search Visual Editor in the Atlas UI to
configure fields of type knnVector
. Instead, use the Atlas Search
JSON Editor to configure fields of type knnVector
.
You can also use Atlas Vector Search with local Atlas deployments that you create with the Atlas CLI. To learn more, see Create a Local Atlas Deployment.
Review knnVector
Type Limitations
You can't index fields inside arrays of documents or fields inside
arrays of objects (Atlas Search embeddedDocuments type) as knnVector
type.
Define the Index for the knnVector
Type
The following is the JSON syntax for the knnVector
type.
Replace the default index definition with the following. To learn more
about the fields, see Field Properties.
1 { 2 "mappings": { 3 "name": "<index-name>", 4 "dynamic": true|false, 5 "fields": { 6 "<field-name>": { 7 "type": "knnVector", 8 "dimensions": <number-of-dimensions>, 9 "similarity": "euclidean | cosine | dotProduct" 10 } 11 } 12 } 13 }
Configure knnVector
Field Properties
The knnVector
type has the following options:
Option | Type | Necessity | Purpose |
---|---|---|---|
type | string | Required | Human-readable label that identifies this field type. Value must
be knnVector . |
dimensions | int | Required | Number of vector dimensions, which Atlas Search enforces at index- and
query-time. This value can't be greater than 4096 . |
similarity | string | Required | Vector similarity function to use to search for top K-nearest neighbors. Value can be one of the following:
|
Try an Example for the knnVector
Type
The following index definition for the sample_mflix.embedded_movies
collection dynamically indexes all the dynamically indexable fields
in the collection and statically indexes plot_embedding
field as
the knnVector
type. The plot_embedding
field contains embeddings
created using OpenAI's text-embedding-ada-002
embeddings model. The
index definition specifies 1536
vector dimensions and measures
similarity using euclidean
.
1 { 2 "mappings": { 3 "dynamic": true, 4 "fields": { 5 "plot_embedding": { 6 "type": "knnVector", 7 "dimensions": 1536, 8 "similarity": "euclidean" 9 } 10 } 11 } 12 }
If you load the sample data on your
cluster and create the preceding Atlas Search index for this collection,
you can run $vectorSearch
queries against this collection.
To learn more about the sample queries that you can run, see
$vectorSearch Examples.