Here is an example to show the unexpected results which seem very "“sentence centric”.
db.stock_news.remove( {} );
db.stock_news.insertMany([
{ author: "Nasdaq Technology Sector Update",
text: "Technology giants were gaining Thursday. Early movers include MongoDB, Inc. which gained more than 20%. Microsoft also gained 8%."},
{ author: "PRNewswire",
text: "Hello world. MongoDB, Inc. (NASDAQ: MDB) is the leading modern database platform. Where is this text?"},
] );
The FTS index was defined with the name “myFtsIndex” as
{
"mappings": {
"dynamic": false,
"fields": {
"text": {
"type": "string",
"analyzer": "lucene.standard",
"multi": {
"keywordAnalyzer": {
"type": "string",
"analyzer": "lucene.keyword"
}
}
}
}
}
}
A query to demonstrate the unexpected results is
(Note I did not use the default index name for some reason so you see the index name “myFtsIndex” below)
db.stock_news.aggregate([
{
$searchBeta: {
index: "myFtsIndex",
"search": {
"query": "MongoDB",
"path": "text"
},
"highlight": {
"path": "text"
} }
},
{
$project: {
"text": 1,
"_id": 0,
"highlights": { "$meta": "searchHighlights" }
}
}]).pretty()
This shows
{
"text" : "Hello world. MongoDB, Inc. (NASDAQ: MDB) is the leading modern database platform. Where is this text?",
"highlights" : [
{
"path" : "text",
"texts" : [
{
"value" : "MongoDB",
"type" : "hit"
},
{
"value" : ", Inc. ",
"type" : "text"
}
],
"score" : 1.8908861875534058
}
]
}
{
"text" : "Technology giants were gaining Thursday. Early movers include MongoDB, Inc. which gained more than 20%. Microsoft also gained 8%.",
"highlights" : [
{
"path" : "text",
"texts" : [
{
"value" : "Early movers include ",
"type" : "text"
},
{
"value" : "MongoDB",
"type" : "hit"
},
{
"value" : ", Inc. which gained more than 20%. ",
"type" : "text"
}
],
"score" : 1.4883723258972168
}
]
}
The first result document is seemingly missing:
- the “Hello world.” sentence in front of the match,
- the rest of the sentence containing the match, and
- the subsequent sentence(s) after the sentence containing the match.
A similar observation occurs in the second match.
This is the basis for my confusion about the highlights data based on the doc at
https://docs.atlas.mongodb.com/reference/full-text-search/highlighting/
Thank you for your help,
Bill