let’s say I have a collection like so:
{
"id": "2902-48239-42389-83294",
"data": {
"location": [
{
"country": "Italy",
"city": "Rome"
}
],
"time": [
{
"timestamp": "1626298659",
"data":"2020-12-24 09:42:30"
}
],
"details": [
{
"timestamp": "1626298659",
"data": {
"url": "https://example.com",
"name": "John Doe",
"email": "john@doe.com"
}
},
{
"timestamp": "1626298652",
"data": {
"url": "https://www.myexample.com",
"name": "John Doe",
"email": "doe@john.com"
}
},
{
"timestamp": "1626298652",
"data": {
"url": "http://example.com/sub/directory",
"name": "John Doe",
"email": "doe@johnson.com"
}
}
]
}
}
Now the main focus is on the array of subdocument(“data.details”): I want to get output only of relevant matches e.g:
db.info.find({"data.details.data.url": "example.com"})
-
How can I get a match for all “data.details.data.url” contains “example.com” but won’t match with “myexample.com”. When I do it with $regex I get too many results, so if I query for “example.com” it also return “myexample.com”
-
Even when I do get partial results (with $match), It’s very slow. I tried this aggregation stages :
-
{ $unwind: "$data.details" }, { $match: { "data.details.data.url": /.*example.com.*/, }, }, { $project: { id: 1, "data.details.data.url": 1, "data.details.data.email": 1, }, },
3. I really don't understand the pattern, with $match, sometimes Mongo do recognize prefixes like "https://" or "https://www." and sometime it does not.
More info: My collection has dozens of GB, I created two indexes:
Compound like so: "data.details.data.url": 1, "data.details.data.email": 1
Text Index: "data.details.data.url": "text", "data.details.data.email": "text"
It did improve the query performance but not enough and I still have this issue with the $match vs $regex. Thanks for helpers!