Hello
My team has a collection of ~45 million documents, each containing a timestamp and a geojson field.
The main use case is querying across both the timestamp and the geojson.
The team had a few solutions:
-
Using a compound index for the two fields - performance wasn’t on par with product expectations - on some cases search time was over 10 seconds.
-
Using the bucket pattern (bucket per time range) for reducing search time, but because the query is across two fields we couldn’t properly use the geojson index.
-
Bucketing by collection, i.e. a collection for every time range. This enables the use of the geojson index while still using buckets. I feel like that’s not a standard solution and requires more implementation at the application layer. This solution however, has the best performance - sub 1 second search time
I would like to hear from you your opinions about the solutions and whether there are some modeling patterns that I am missing
Hi @Matan_Shaked ,
Can you share some document samples, indexes and queries so we can get a better feel of what you are describing.
If you found a performant solution while still with manageable coding overhead than that is perfectly fine and shows how mongoDB allows you to do whats best for you.
Thanks
Pavel
Here is a minimal representation of our documents:
{
"_id" : ObjectId("63639134eff35d0dcb9b0aae"),
"geoJson" : {
"type" : "Feature",
"properties" : {
"timestamp" : 1667469522345.0,
"sensorId" : "12345"
},
"geometry" : {
"type" : "Point",
"coordinates" : [
9.1594001938029,
45.4863023702389
]
}
}
}
This is one of the indexes that we tried:
{
"geoJson.properties.sensorId" : 1.0,
"geoJson.properties.timestamp" : 1.0,
"geoJson.geometry" : "2dsphere"
}
And here is an example query:
db.getCollection('telemetries').aggregate([
{
$match: {
$and: [
{
"geoJson.geometry": {
$geoWithin: {
$center: [
[
9.159400193802895,
45.48630237023892
],
0.3
]
}
}
}, {
$and: [
{ "geoJson.properties.timestamp": { $gt: 1647469522345 } },
{ "geoJson.properties.timestamp": { $lt: 1687469522345 } }
]
}
]
}
}, {
$group: {
_id: "$geoJson.properties.sensorId"
}
}
]);
Hi @Matan_Shaked ,
It looks as an index order of fields might work better if you index:
{
"geoJson.geometry" : "2dsphere",
"geoJson.properties.sensorId" : 1.0,
"geoJson.properties.timestamp" : 1.0
}
Consider that since it follows better the ESR rule:
Thanks
Pavel