Make the MongoDB docs better! We value your opinion. Share your feedback for a chance to win $100.
Click here >
Docs Menu
Docs Home
/ /

$similarityCosine (expression operator)

New in version 8.3.

$similarityCosine

Returns the cosine similarity between two numeric vectors represented as arrays or binData values. Cosine similarity measures the cosine of the angle between two vectors and indicates how similar their directions are, independent of their magnitudes.

$similarityCosine has two syntax forms.

Concise syntax returns a raw cosine similarity score:

{ $similarityCosine: [ <vector1>, <vector2> ] }

Full syntax accepts an optional normalization parameter:

{
$similarityCosine: {
vectors: [ <vector1>, <vector2> ],
score: <boolean>
}
}

When using the full syntax, $similarityCosine accepts the following fields:

Field
Type
Necessity
Description

vectors

Array

Required

Array of exactly two expressions. Each expression must resolve to an array of numeric values or a binData value. Both vectors must have equal length.

score

Boolean

Optional

When true, returns a normalized score in the range [0, 1] using the formula (1 + cosine) / 2. Defaults to false.

For more information on expressions, see Expressions.

If either argument resolves to null or refers to a missing field, $similarityCosine returns null.

If either input vector has a magnitude of zero (that is, all elements are 0), $similarityCosine returns 0.

$similarityCosine returns a double. When score is false (the default), the result is the raw cosine similarity value in the range [-1, 1]:

  • 1 indicates the vectors point in identical directions.

  • 0 indicates the vectors are orthogonal.

  • -1 indicates the vectors point in opposite directions.

When score is true, the result is normalized to the range [0, 1] using the formula (1 + cosine) / 2.

$similarityCosine returns an error in the following cases:

  • Either argument does not resolve to an array or binData value.

  • Input arrays or binData values have different lengths.

  • Either array contains non-numeric elements.

The following example uses a vectors collection:

db.vectors.insertMany( [
{ _id: 1, a: [1, 2, 3], b: [1, 2, 3] },
{ _id: 2, a: [1, 2, 3], b: [3, 2, 1] },
{ _id: 3, a: [1, 2, 3], b: [4, 5, 6] }
] )

The following aggregation pipeline computes the cosine similarity between the a and b fields for each document and returns both the raw score and the normalized score:

db.vectors.aggregate( [
{
$project: {
raw: { $similarityCosine: [ "$a", "$b" ] },
normalized: {
$similarityCosine: {
vectors: [ "$a", "$b" ],
score: true
}
}
}
}
] )

The operation returns the following results:

{ _id: 1, raw: 1, normalized: 1 }
{ _id: 2, raw: 0.7142857142857143,
normalized: 0.8571428571428571 }
{ _id: 3, raw: 0.9746318461970762,
normalized: 0.9873159230985381 }

Back

$sigmoid

On this page