WhMongodb check bson site of collections

When I check the slow queries in my database, I find the following aggregate query being executed on multiple collections:

[
  {
    "$addFields": {
      "temporary_doc_size_field": {
        "$bsonSize": "$$ROOT"
      }
    }
  },
  {
    "$match": {
      "temporary_doc_size_field": {
        "$lt": 16793600
      }
    }
  },
  {
    "$project": {
      "temporary_doc_size_field": 0
    }
  },
  {
    "$sample": {
      "size": 1000
    }
  }
]

It performs a collection scan and reads a lot into the disk. I checked my application and couldn’t find this query in the source code.

So, is this aggregate query automatically executed by MongoDB, and what is its purpose? Is there any way to limit it because it is quite costly in terms of disk I/O?

Where and how are you running mongod?

This is a really bad query, and I am 99.999999% sure that MongoDB does not execute that automatically. Especially, if run on multiple collections.

I think, the value 16_793_600 looks pretty close to the 16MB size limit of BSON so every documents will match.

How is your system secured?

Currently, it is only possible to access MongoDB from the local area network (LAN).
so if it was hack, it’s really bad

May be you have some users, analyzing the schema with Compass. The presence of $sample:size:1000 might seems to match Compass remark that is printed when analyzing the schema.

This report is based on a sample of 1000 documents.

Are the queries spaced evenly over time? In principal an automated process would run the queries at a regular interval. A human will have more random interval.

@Vinh_Ph_m_Quang, is there any followup with this?

The slow query monitoring system of the system is currently paused for an upgrade, so I am unable to monitor them at the moment.