Definition
New in version 8.3.
$similarityCosineReturns the cosine similarity between two numeric vectors represented as arrays or
binDatavalues. Cosine similarity measures the cosine of the angle between two vectors and indicates how similar their directions are, independent of their magnitudes.$similarityCosinehas two syntax forms.Concise syntax returns a raw cosine similarity score:
{ $similarityCosine: [ <vector1>, <vector2> ] } Full syntax accepts an optional normalization parameter:
{ $similarityCosine: { vectors: [ <vector1>, <vector2> ], score: <boolean> } } When using the full syntax,
$similarityCosineaccepts the following fields:FieldTypeNecessityDescriptionvectorsArray
Required
Array of exactly two expressions. Each expression must resolve to an array of numeric values or a
binDatavalue. Both vectors must have equal length.scoreBoolean
Optional
When
true, returns a normalized score in the range[0, 1]using the formula(1 + cosine) / 2. Defaults tofalse.For more information on expressions, see Expressions.
Behavior
null and Missing Values
If either argument resolves to null or refers to a missing
field, $similarityCosine returns null.
Zero-Magnitude Vectors
If either input vector has a magnitude of zero (that is, all
elements are 0), $similarityCosine returns 0.
Return Value
$similarityCosine returns a double. When
score is false (the default), the result is the raw cosine
similarity value in the range [-1, 1]:
1indicates the vectors point in identical directions.0indicates the vectors are orthogonal.-1indicates the vectors point in opposite directions.
When score is true, the result is normalized to the range
[0, 1] using the formula (1 + cosine) / 2.
Errors
$similarityCosine returns an error in the following
cases:
Either argument does not resolve to an array or
binDatavalue.Input arrays or
binDatavalues have different lengths.Either array contains non-numeric elements.
Example
The following example uses a vectors collection:
db.vectors.insertMany( [ { _id: 1, a: [1, 2, 3], b: [1, 2, 3] }, { _id: 2, a: [1, 2, 3], b: [3, 2, 1] }, { _id: 3, a: [1, 2, 3], b: [4, 5, 6] } ] )
The following aggregation pipeline computes the cosine similarity
between the a and b fields for each document and returns
both the raw score and the normalized score:
db.vectors.aggregate( [ { $project: { raw: { $similarityCosine: [ "$a", "$b" ] }, normalized: { $similarityCosine: { vectors: [ "$a", "$b" ], score: true } } } } ] )
The operation returns the following results:
{ _id: 1, raw: 1, normalized: 1 } { _id: 2, raw: 0.7142857142857143, normalized: 0.8571428571428571 } { _id: 3, raw: 0.9746318461970762, normalized: 0.9873159230985381 }