Hi @williamwjs,
Have you tried using the constant
scoring option? I believe the behaviour you’ve described in terms of the scoring is expected (at least from analyzing the sample documents). Based off your example, there are a total of 5 documents of which:
- 2 documents contain the value
"storage"
in the "tags"
array field
- 3 documents contain the value
"school"
in the "tags"
array field
As per the Score the Documents in the Results documentation:
Many factors can influence a document’s score, including:
- The position of the search term in the document,
- The frequency of occurrence of the search term in the document,
- The type of operator the query uses,
- The type of analyzer the query uses.
In this particular case, I believe the frequency of occurrence is one of the main factors for why you are seeing the results having different scores even though they each only contain each of the terms once. Example below from my test environment based off your sample documents provided:
5 documents total, for the tags
array - 2 documents containing “storage”, 3 documents containing “school”:
[
{ tags: [ 'office', 'storage' ], score: 0.47005030512809753 },
{ tags: [ 'storage' ], score: 0.47005030512809753 },
{ tags: [ 'school', 'office' ], score: 0.2893940806388855 },
{ tags: [ 'home', 'school' ], score: 0.2893940806388855 },
{ tags: [ 'school' ], score: 0.2893940806388855 }
]
We can see here the first 2 documents have a higher score (probably what you are experiencing).
Now, let’s add in another document that contains the "storage"
value inside of the "tags"
array and perform the same search.
6 documents total, for the tags
array - 3 documents containing “storage”, 3 documents containing “school”:
[
[
{ tags: [ 'school', 'office' ], score: 0.3767103850841522 },
{ tags: [ 'home', 'school' ], score: 0.3767103850841522 },
{ tags: [ 'office', 'storage' ], score: 0.3767103850841522 },
{ tags: [ 'school' ], score: 0.3767103850841522 },
{ tags: [ 'storage' ], score: 0.3767103850841522 },
{ tags: [ 'storage', 'home' ], score: 0.3767103850841522 }
]
We can see here that the scores are now all the same for this result set.
(Test environment i’m using for this has the index named "tagsindex"
) - Reverting back to the original 5 documents, when using constant
scoring value
of 1
:
db.tags.aggregate({
"$search": {
"index": "tagsindex",
"compound": {
"should": [{
"queryString": {
"defaultPath": "tags",
"query": "school OR storage",
"score": { "constant" : { "value" : 1} }
}
}
]
}
}
},
{
"$project": {
"_id": 0,
"tags": 1,
"score": { "$meta": "searchScore"}
}
})
Output:
[
{ tags: [ 'storage' ], score: 1 },
{ tags: [ 'school' ], score: 1 },
{ tags: [ 'office', 'storage' ], score: 1 },
{ tags: [ 'home', 'school' ], score: 1 },
{ tags: [ 'school', 'office' ], score: 1 }
]
Wondering if this would work for you / your use case and if the explanation above helps with the scoring differences you may be seeing.
Regards,
Jason