Aggregation pipeline geoNear, geoWithin question

I’m struggling with the follow problem.

I’m making an aggregation pipeline (testing in Compass) to find 10 places near a geolocation (geoNear).

I also would like to know if those places fit within a polygon (geoWithin), however the result shouldn’t be limited by the fact that those places fit in that polygon.

That’s the struggle I have.
My need:

  • Run geoNear
  • Add a field “is_in_polygon” based on a geoWithin query.
    (however I can’t figure out how to use $addField in combination with $match)
  • sort the results on “is_in_polygon”: true
  • limit the results to 12.

In the case nothing is within the polygon I still would like to get 12 results back.
In the case only 6 out of 12 are within the polygon, I would like to get 12 results back, the first 6 with is_in_polygon: true, the other 6 with is_in_polygon: false

I would like to do this in one pipeline.
I was thinking in working with $facet, however $geoNear cannot be used within $facet, it always has to be the first stage in the pipeline.

Is it possible to use $addField (or $set, $project) based on a $match (query) outcome?
Is it possible to sort on that boolean field (so results with that specific field come first.

Below my current pipeline.

[{
 $geoNear: {
  near: {
   type: 'Point',
   coordinates: [
    5.092047,
    51.984694
   ]
  },
  key: 'geo',
  spherical: true,
  distanceField: 'dist',
  query: {
   access: {
    $ne: 'quests_only'
   }
  }
 }
}, {
 $match: {
  geo: {
   $geoWithin: {
    $geometry: {
     type: 'Polygon',
     coordinates: [
      [
       [
        7.02324,
        52.3444473
       ],
       [
        6.5975198,
        52.3712859
       ],
       [
        6.4958963,
        52.1763364
       ],
       [
        6.9133767,
        52.1544352
       ],
       [
        7.02324,
        52.3444473
       ]
      ]
     ]
    }
   }
  }
 }
}, {
 $limit: 12
}]

I don’t have a solution (at least not yet) but moving the “limit” stage just after “geoNear” will help much as you don’t need any more data than those 12 for the remaining stages.

You may then try the “facet” after that and have 2 new lists that are “all” and “within the shape”.

I think I have a working solution. This topic in stackoverflow helped me: MongoDB: adding fields based on partial match query - expression vs query - Stack Overflow

Although I’m wondering if I can use the size of one $facet array to define the length of the other $facet array.

If there is a more optimized way to do it, please let me know. I would like to get at least 12 results back. It could be that inside has 0 or 1 result, hence I also put that 12 limit on the outside polygon.

[{
 $geoNear: {
  near: {
   type: 'Point',
   coordinates: [
    5.092047,
    51.984694
   ]
  },
  key: 'geo',
  spherical: true,
  distanceField: 'dist',
  query: {
   access: {
    $ne: 'quests_only'
   }
  }
 }
}, {
 $facet: {
  inside: [
   {
    $match: {
     geo: {
      $geoWithin: {
       $geometry: {
        type: 'Polygon',
        coordinates: [
         [
          [
           7.02324,
           52.3444473
          ],
          [
           6.5975198,
           52.3712859
          ],
          [
           6.4958963,
           52.1763364
          ],
          [
           6.9133767,
           52.1544352
          ],
          [
           7.02324,
           52.3444473
          ]
         ]
        ]
       }
      }
     }
    }
   },
   {
    $limit: 12
   }
  ],
  outside: [
   {
    $match: {
     geo: {
      $not: {
       $geoWithin: {
        $geometry: {
         type: 'Polygon',
         coordinates: [
          [
           [
            7.02324,
            52.3444473
           ],
           [
            6.5975198,
            52.3712859
           ],
           [
            6.4958963,
            52.1763364
           ],
           [
            6.9133767,
            52.1544352
           ],
           [
            7.02324,
            52.3444473
           ]
          ]
         ]
        }
       }
      }
     }
    }
   },
   {
    $limit: 12
   }
  ]
 }
}, {
 $project: {
  both: {
   $concatArrays: [
    '$inside',
    '$outside'
   ]
  }
 }
}, {
 $unwind: {
  path: '$both'
 }
}, {
 $replaceRoot: {
  newRoot: '$both'
 }
}, {
 $limit: 14
}]

this seems pretty plausible but there are two things in this version of your query (let’s call v1 if we need later :slight_smile: )

1- you are using every single document in your facet stage. you just need 12 of them. that is why I told you to move the limit stage after the geoNear.

  • this way you also do not need limit stages inside the facet stage.

2- you find inside/outside documents and later combine them together, but then you do not add any in/out indicator so it becomes the same result geoNear gives.

  • in your facet stages, add your “true/false” indicator with addField to your inside/outside results so that when combined they will have correct identifiers.

I tried this, however I just get the nearest results based on distance. In my scenario there are places that are further away that fit within the polygon. In polygon results have to appear first, however if I limit after geoNear I exclude them.

I would love the limit, because I think it could make it more performant.

Thanks for the idea, indeed I need to add an identifier.

V2 with an addedField (used $set, alias of $addField):

[{
 $geoNear: {
  near: {
   type: 'Point',
   coordinates: [
    5.092047,
    51.984694
   ]
  },
  key: 'geo',
  spherical: true,
  distanceField: 'dist',
  query: {
   access: {
    $ne: 'quests_only'
   }
  }
 }
}, {
 $facet: {
  inside: [
   {
    $match: {
     geo: {
      $geoWithin: {
       $geometry: {
        type: 'Polygon',
        coordinates: [
         [
          [
           7.02324,
           52.3444473
          ],
          [
           6.5975198,
           52.3712859
          ],
          [
           6.4958963,
           52.1763364
          ],
          [
           6.9133767,
           52.1544352
          ],
          [
           7.02324,
           52.3444473
          ]
         ]
        ]
       }
      }
     }
    }
   },
   {
    $limit: 12
   },
   {
    $set: {
     inside: 1
    }
   }
  ],
  outside: [
   {
    $match: {
     geo: {
      $not: {
       $geoWithin: {
        $geometry: {
         type: 'Polygon',
         coordinates: [
          [
           [
            7.02324,
            52.3444473
           ],
           [
            6.5975198,
            52.3712859
           ],
           [
            6.4958963,
            52.1763364
           ],
           [
            6.9133767,
            52.1544352
           ],
           [
            7.02324,
            52.3444473
           ]
          ]
         ]
        }
       }
      }
     }
    }
   },
   {
    $limit: 12
   },
   {
    $set: {
     inside: 0
    }
   }
  ]
 }
}, {
 $project: {
  both: {
   $concatArrays: [
    '$inside',
    '$outside'
   ]
  }
 }
}, {
 $unwind: {
  path: '$both'
 }
}, {
 $replaceRoot: {
  newRoot: '$both'
 }
}, {
 $limit: 12
}]

Is there a way to get all the documents that are not in the first facet array. Now I do a double geoWithin query (although the limit is 12 I don’t know if this has performance impacts).

This was your description for why I was telling to use limit after geoNear :slight_smile: Your v2 has no problem if you have changed mind. But still be aware that although you use limits in them facet stages will possibly use all documents in this shape. I am not sure if Mongodb optimizes these queries somehow.

for combining the two facet results, check if “setUnion” operator does better than “concatArrays”. I haven’t used it for this kind of operation but seem promising.

1 Like