/ /

/ /

How to Index Vector Fields

The vector field type and vectorSearch operator are available as Preview features. The feature and the corresponding documentation might change at any time during the Preview period. To learn more, see Preview Features.

Interface

You can use the vector type to index vector embeddings. The vector field must contain an array of numbers of the following types:

BSON int32, int64, or double data types
BSON double data type

You can use the vectorSearch operator, similar to the $vectorSearch stage, in your $search aggregation pipeline to query fields indexed as the vector type.

`vector` Type Limitations

The following limitations apply:

You can't index fields with arrays of objects (MongoDB Search embeddedDocuments type) as vector type.
You can't set storedSource to true in index definitions that contain vector type. Instead, use include to specify the fields to store on mongot or use exclude to exclude the vector type field from storage.
You can't use the $vectorSearch stage to query fields indexed as the vector type.

Define the Index for the `vector` Type

Configure `vector` Field Properties

The MongoDB Search vector type takes the following parameters:

Option	Type	Necessity	Description
`type`	`vector`	Required	Human-readable label that identifies this field type. Value must be `vector`.
`numDimensions`	Int	Required	Number of vector dimensions that MongoDB Search enforces at index-time and query-time. You can set this field only for `vector`-type fields. You must specify a value less than or equal to `8192`. For indexing quantized vectors or BinData, you can specify one of the following values: `1` to `8192` for `int8` vectors for ingestion. Multiple of `8` for `int1` vectors for ingestion. `1` to `8192` for `binData(float32)` and `array(float32)` vectors for automatic scalar quantization. Multiple of `8` for `binData(float32)` and `array(float32)` vectors for automatic binary quantization. The embedding model you choose determines the number of dimensions in your vector embeddings, with some models having multiple options for how many dimensions are output. To learn more, see Choosing a Method to Create Embeddings.
`similarity`	String	Required	Vector similarity function to use to search for top K-nearest neighbors. You can set this field only for `vector`-type fields. You can specify one of the following values: `euclidean` - measures the distance between ends of vectors. `cosine` - measures similarity based on the angle between vectors. `dotProduct` - measures similarity like `cosine`, but takes into account the magnitude of the vector. To learn more, see About the Similarity Functions.
`quantization`	String	Optional	Type of automatic vector quantization for your vectors. Use this setting only if your embeddings are `float` or `double` vectors. You can specify one of the following values: `none` - Indicates no automatic quantization for the vector embeddings. Use this setting if you have pre-quantized vectors for ingestion. If omitted, this is the default value. `scalar` - Indicates scalar quantization, which transforms values to 1 byte integers. `binary` - Indicates binary quantization, which transforms values to a single bit. To use this value, `numDimensions` must be a multiple of 8. If precision is critical, select `none` or `scalar` instead of `binary`. To learn more, see Vector Quantization.
`hnswOptions`	Object	Optional	Parameters to use for Hierarchical Navigable Small Worlds graph construction. If omitted, uses the default values for the `maxEdges` and `numEdgeCandidates` parameters. IMPORTANT: This is available as a Preview feature. Modifying the default values might negatively impact your MongoDB Search index and queries.
`hnswOptions.` `maxEdges`	Int	Optional	Maximum number of edges (or connections) that a node can have in the Hierarchical Navigable Small Worlds graph. Value can be between `16` and `64`, both inclusive. If omitted, defaults to `16`. For example, for a value of `16`, each node can have a maximum of sixteen outgoing edges at each layer of the Hierarchical Navigable Small Worlds graph. A higher number improves recall (accuracy of search results) because the graph is better connected. However, this also increases query and indexing time by increasing the number of neighbors to evaluate per graph node, and requires more memory to store the additional nodes for each connection in the Hierarchical Navigable Small Worlds graph.
`hnswOptions.` `numEdgeCandidates`	Int	Optional	Analogous to `numCandidates` at query-time, this parameter controls the maximum number of nodes to evaluate to find the closest neighbors to connect to a new node. Value can be between `100` and `3200`, both inclusive. If omitted, defaults to `100`. A higher number provides a graph with high-quality connections, which can improve search quality (recall), but it can also increase query latency.

Try an Example for the `vector` Type

The following index definition example uses the sample_mflix.embedded_movies collection in the sample data. After you load the collection, you can use the following example to index the plot_embedding_voyage_3_large field as the vector type for running queries using the vectorSearch (MongoDB Search Operator). For a sample query to run against this index, see Examples.

This index definition automatically indexes all the dynamically indexable fields using the default typeSet and also indexes the plot_embedding_voyage_3_large field as vector type with the following settings:

2048 number of dimensions
dotProduct similarity function
scalar quantization

Back

uuid

Stored Source

vector Type Limitations

Define the Index for the vector Type

Configure vector Field Properties

Configure vector Field Properties

Try an Example for the vector Type

`vector` Type Limitations

Define the Index for the `vector` Type

Configure `vector` Field Properties

Configure `vector` Field Properties

Try an Example for the `vector` Type