I have a simple blog post website where different authors can create articles, which consists of title
and content
field. I used mongodb due to legacy implementation and other factors.
In the website, it has a search bar where user can do a free text search to get a list of relevant articles. And in the backend, I used Mongodb text index and $text search to do the queries.
How I created the index (title has more weight)
db.articles.createIndex(
{ title: "text", content: "text" },
{
weights: {
title: 10,
content: 1
},
name: "ArticleIndex"
}
)
Example query
db.articles.find(
{ $text: { $search: "coffee bake" } },
{ score: { $meta: "textScore" } }
).sort(
{ score: { $meta: "textScore", _id: -1 } }
)
In the query, I put an additional _id: -1
so that latest created articles will be put first if there’s a tie.
Now the problem is, some of the authors tried to manipulate the sorting by putting certain keywords multiple times in the content
, to the extent that it looks quite obvious. For instance, most of my users would search a city name New York
. Hence, a particular author spams the phrase New York
all over the content
. Due to this, his article gets a high text score based on Mongodb $text search, and always appears at the top.
Is there a way in Mongodb $text search to ignore multiple occurrence of the search keywords? Also, is there a way to somehow include the _id
field to contribute to the sorting score, i.e. latest item will have higher scores?