Paging with count - Performance $facet vs. simple query

Hi all,

I’m implementing the well-known task for pagination with a count over a collection.
The discussion comes down to measuring performance of two implementations:

The first implementation is by using $facet, and the second is by using a simple query and performing two calls (one for the count and another for the results) on it.

I’ve done some simple tests against the same collection, and the results surprised me…
The test is done using .Net 5 and MongoDB.Driver (2.12.3):

            var dateTimeFrom = new DateTime(2022, 6, 16, 0, 3, 53, DateTimeKind.Utc);

            var matchStage = new BsonDocument("$match",
                new BsonDocument("utcCreated",
                    new BsonDocument("$gte", dateTimeFrom)));

            var facetStage = new BsonDocument("$facet",
                new BsonDocument
                {
                    {
                        "metadata",
                        new BsonArray
                        {
                            new BsonDocument("$count", "total")
                        }
                    },
                    {
                        "data",
                        new BsonArray
                        {
                            new BsonDocument("$skip", 100),
                            new BsonDocument("$limit", 10)
                        }
                    }
                });

            var pipeline = new[]
            {
                matchStage,
                facetStage
            };

            var pipelineDefinition = PipelineDefinition<RawMessageDto, RawMessageCountModel>.Create(pipeline);

            var watch = Stopwatch.StartNew();

            var elapsedList1 = new List<long>();

            for (var i = 0;
                 i < 10;
                 i++)
            {
                watch.Restart();

                var result = await _rawMessageCollection
                    .Aggregate(pipelineDefinition)
                    .FirstOrDefaultAsync();
                elapsedList1.Add(watch.ElapsedMilliseconds);
            }

            watch.Reset();

            var elapsedList2 = new List<long>();

            for (var i = 0;
                 i < 10;
                 i++)
            {
                watch.Restart();

                var query = _rawMessageCollection.Find(x => x.UtcCreated >= dateTimeFrom);

                var countTotal = await query.CountDocumentsAsync();

                var gg = await query.Skip(100).Limit(10).ToListAsync();
                elapsedList2.Add(watch.ElapsedMilliseconds);
            }

            watch.Stop();

The results from the executions are:

Exec.No $facet simple qry.
1 1685 565
2 1733 535
3 1691 546
4 1731 545
5 1823 546
6 1711 546
7 1713 577
8 1751 552
9 1708 541
10 1690 579
AVG 1724 553.2

According to some of the docs. and topics on forums:
In the first implementation, the $facet queries on the same set of data, thus expecting a faster execution.
In the second implementation, we have a query, and do two “round-trips” on the same query.

Questions:

  1. Is the $facet implementation done in a wrong way somehow?
  2. What happens with the query with the second implementation?
    Does the cursor continue to live in between the 1st and 2nd execution, or the query is being executed twice?
    Observing the query before it gets executed:
{find({ "utcCreated" : { "$gt" : ISODate("2022-06-16T00:03:53Z") } })}
  1. Doing the same but in Robo3T shell with a plain JS, the $facet wins with ~10 times shorter execution time over performing a count and skip + limit on the same cursor.

NOTE: Using the $facet way, there is some serialization and model creation time of the classes below (the data property doesn’t come in this calculation since the IEnumerable exists in both cases) , but I believe that time is trivial and can be neglected observing the results of 10 subsequent executions.

        public class RawMessageCountModel
        {
            public IEnumerable<Metadata> Metadata { get; set; }
            public IEnumerable<RawMessageDto> Data { get; set; }
        }

        public class Metadata
        {
            public int total { get; set; }
        }

similar topic:

A few notes.

  1. If you do not sort you might get random results.
  2. I do not think that you do 2 round trips in your simple query. Rather than doing query.Skip(100) redo _rawMessageCollection.Find(…).Skip(100).
  3. Rather than Skip(100), I would Skip( 100 + i*10 ) to do real paging otherwise you always hit the same data.
  4. I would also compare the documents I get between the 2 alternatives to make sure I get the same results.