How to get document count and paginated result in 1 single call

V11 · June 27, 2023, 9:06pm

I recently added pagination to our GQL API using the bucket pattern (timestamp + ObjectID comparsion). The paging article on Mongo blog was very useful to help me get started.

I also followed the GQL recommended Relay-pattern (see docs) which calls for returning a totalCount field (Ex: total number of products, or social media connections regardless of the pagination).

How do I avoid sending two separate queries to the DB server, 1 to get the total count and 1 for the current page/bucket?

My current query

    const filters = { ... }  // only items after a timestamp & _id
    const rs = await this.mediaObjectModel.aggregate<MediaObject>([
      filters,
      {
        $sort: { createdAt: -1, _id: -1 }
      },
      {
        $limit: first // bucket size (items per page)
      }
    ])

Asya_Kamsky · June 28, 2023, 7:33pm

There is no mechanism to send back both count and cursor with result set, so you can either do two queries like you are doing now, or you could play some tricks with $unionWith to get the same data in one pipeline but it seems like it won’t really save you much (and may make it more complex to parse out the result document that represents count vs the other documents which represent your page content).

Asya

John_Sewell · June 28, 2023, 10:05pm

Can you try using a facet with one stage for a count and one for the initial records?

V11 · June 28, 2023, 10:43pm

Thanks everyone. I didn’t know this is a common task when doing pagination. $facet and ‘pagination’ are the search keywords.

John_Sewell · June 29, 2023, 5:32am

Be wary of .skip with lots of results it can be slow. Keep track of the last id and get results after that one the next time this way an index can be used.

Asya_Kamsky · June 30, 2023, 6:11pm

I would recommend against using $facet as performance-wise it’s going to be worse than just about any other option.

Asya_Kamsky · June 30, 2023, 6:13pm

$facet is also limited to 16MB document size so that may also create an issue if the number of documents you are returning is large enough.

Asya

John_Sewell · June 30, 2023, 6:44pm

Asya, I agree, you need to be wary of using Facets as only the data fed into a facet can make use of indexes and as they form the output of the facet into an array within one document you are limited to 16MB.

I’ve run into the limit a few times, with a data checking report that used a first stage match then about 40 facets to run data validation checks. In this case is was pretty performant as opposed to making 40 calls over the same filtered set of data, but as you say the document size limit should be kept in mind if you have either a lot of documents or large ones.

V11 · July 2, 2023, 8:11pm

I would recommend against using $facet as performance-wise it’s going to be worse than just about any other option.

Can you please elaborate? As mentioned in the original question, the alternative would be making two separate calls, one to get the current page data, and another to get the total count. $facet allows you to send two queries as one. Provided the search fields are properly indexed, I’m not clear as to why it’s not as performant?

@Asya_Kamsky

John_Sewell · July 3, 2023, 5:59am

Have a play with your dataset, but testing locally with 1M records and a non-covering index that was used on the initial stage of the aggregate I get slightly different timings. Just shy of 200ms for a straight count and find vs about 250ms for the facet query.

db.getCollection("Test").find({name:/^B/}).count()
db.getCollection("Test").find({name:/^B/}).limit(1000)

db.getCollection("Test").explain().aggregate([
{
    $match:{
        name:/^B/
    }
},
{
    $facet:{
        totalDocs:[
            {
                $count:'count'
            },
        ],
        Docs:[
            {
                $limit:1000
            }
        ]
    }
}
])

I imagine there is an overhead associated with the setting up of the facets and combining the data into the array, you then also have the overhead in the client of working with the documents within the array as opposed to just a list of docs.

system · July 8, 2023, 5:59am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.