How many RPU MongoDB Atlas uses to find random records using aggregate?

tri_be · September 27, 2022, 4:04am

In a collection I have about 10 million documents.

I use this code to find random 20 of them:

db.mycoll.aggregate([{ $sample: { size: 20 } }])

How many RPU MongoDB Atlas needs to do this?

Jason_Tran · October 19, 2022, 9:38pm

Hi @tri_be - Welcome to the community

How many RPU MongoDB Atlas needs to do this?

I would recommend going over the Serverless - Usage Cost Summary documentation. In regards to RPU’s specifically (as of the time of this message):

You are charged one RPU for each document read (up to 4KB) or for each index read (up to 256 bytes).

So in terms of RPU for your question, one of the factors you will need to consider is document and index read size(s).

In a collection I have about 10 million documents.
db.mycoll.aggregate([{ $sample: { size: 20 } }])

There are several conditions in which the $sample stage will do a COLLSCAN / use all documents from preceding aggregation stage or use a pseudo-random cursor. As per the documentation linked:

If all of the following conditions are true, $sample uses a pseudo-random cursor to select the N documents:

$sample is the first stage of the pipeline.

N is less than 5% of the total documents in the collection.

The collection contains more than 100 documents.

If any of the previous conditions are false, $sample

Reads all documents that are output from a preceding aggregation stage or a collection scan.

Performs a random sort to select N documents.

Whether the RPU usage is higher when the pseudo-random cursor is used versus when it is not would differ on a case-by-case basis.

As serverless costs may be a concern to you, you may wish to set up a billing alert.

Regards,
Jason