"Conditional sort", a data modelling problem

ilari · November 16, 2022, 7:39am

In case someone else has a similar problem, I’ve come up with at least a solution that works, not necessarily the solution:

db.profiles.aggregate([
    // Match only profiles with given location
    {
      "$match": {
        "locations.adm1_id": "new-york"
      }
    },
    // Unwind locations
    {
      "$unwind": "$locations" 
    },
    // The unwinded working set might have profiles with non-relevant locations.
    // Match only unwinded profiles with given location.
    {
      "$match": {
        "locations.adm1_id": "new-york" 
      }
    },

    // Sort profiles with highlighted location first, second order by
    // orderKey-field, so that the next stage picks up the highlighted profile.
    {
      "$sort": { 
        "locations.highlight": -1,
        "orderKey": 1
      }
    },
    // Now we might have duplicates if a profile had two locations with same
    // adm1_id, so group by _id, grabbing the `$first` values
    {
      "$group": {
        "_id": "$_id",
        "name": {
          "$first": "$name"
        },
        "locations": {
          "$first": "$locations"
        },
        "orderKey": {
          "$first": "$orderKey"
        }
      }
    },
    // $group messes up the sort ordering, so we sort again!
    {
      "$sort": {
        "locations.highlight": -1,
        "orderKey": 1
      }
    },
    // We want to skip & limit our results, but are also interested in the total amount of results.
    {
      "$facet": {
        "paginatedResults": [
          {
            "$skip": 24 // skip whatever is needed
          },
          {
            "$limit": 24 // limit to page size
          },
          // use project to make sure we only return safe fields,
          // $group stage is optional and used only if adm1_id was provided.
          {
            "$project": {
              "name": 1,
              "locations": 1,
              "orderKey": 1
            }
          }
        ],
        // Get total count of result documents
        "totalCount": [
          {
            "$count": "count"
          }
        ]
      }
    }
  ]);

It is obviously slower than a plain .find() on the index, but a typical query within a collection of 5000 documents takes approx. ~12-20ms, or ~20-30ms, measured from the node client.

Any ideas how to achieve the same, but better and faster, are very much welcome. My aggregate-knowledge is very limited.