What is dirty data? And is there a serverStatus fields description site?

  1. In db.serverStatus().wiredTiger.cache, there is a field called “bytes dirty in the cache cumulative”.
    What does “dirty” mean here?

  2. Is there a serverStatus fields description site?
    Because there are so many fields that are not explained in the manual.

HI @Kim_Hakseon,

“Dirty” is referring to data that has been modified in the WiredTiger cache but not yet written to the data files via a checkpoint. This general concept is similar to the description of Page cache on Wikipedia.

The cumulative total is a sum of dirty bytes since the mongod process was last started. This isn’t an overly interesting number for diagnostic purposes: the key wiredTiger.cache metrics are typically around cache usage and eviction activity.

Commonly referenced serverStatus output fields are described in the MongoDB server manual. Those are the most useful ones to focus on as a DBA.

There are additional WiredTiger metrics that require more context on storage engine internals, and are generally only interesting to developers working on the core storage engine. The wiredTiger section of serverStatus effectively dumps all of the available metrics from the WiredTiger API.

For more details see the WiredTiger Developer Site and the MongoDB server source code.

Regards,
Stennie

1 Like

Thank you very much for your kind reply. :smiley:

But I don’t understand what this sentence means, can you do it again in more detail?

“Dirty” is referring to data that has been modified in the WiredTiger cache but not yet written to the data files via a checkpoint.

Hi @Kim_Hakseon,

When data is first loaded into the WiredTiger cache it is unmodified or “clean”.

Data that is newly inserted or updated will be journaled on disk and written to the WiredTiger cache. Any modified data will be considered “dirty” in the cache until it is persisted to the data files in the MongoDB data path via a periodic checkpoint. By default, checkpoints run every 60 seconds but another trigger for checkpoints is percentage of dirty data. When an application approaches the maximum cache size, WiredTiger begins eviction to stop memory use from growing too large, approximating a least-recently-used algorithm.

After a checkpoint completes, all data that is now consistent with the data files will be marked as clean.

You can monitor cache activity via the wiredTiger.cache metrics in serverStatus:

  • wiredTiger.cache.maximum bytes configured is the maximum cache size
  • wiredTiger.cache.bytes currently in the cache is the total size of data (including clean & dirty)
  • wiredTiger.cache.tracked dirty bytes in the cache indicates modified data that needs to be written to the data files via a checkpoint

There’s some further background in the Cache and eviction tuning section of the WiredTiger Developer Site. I would generally leave these settings at their default values unless you have a clear way to test the impact on your workload and a specific performance issues to address.

Regards,
Stennie

3 Likes

Oh, thank you very much for your answer.:smiley: :smiley: :+1:

Please check my understanding after seeing your answer.

  1. When data is inserted or updated, “dirty” is data that is flushed periodically or full of cache by checkpoint(storage.syncPeriodSecs).

  2. When data is inserted or updated, data is stored in journals and memory simultaneously.

Great answer by @Stennie_X just like always.

@Kim_Hakseon, to clarify your second point.

Journaling is a type of write-ahead logging. In MongoDB (with WiredTiger storage engine) journaling happens in-memory, within WiredTiger cache. In-memory journal is an append-only data structure that supports fast writes.

First, the metadata of an insert/update operation is appended to an in-memory journal before updating the actual in-memory data. After that, the in-memory journals are flushed to the disk when one of these conditions are met. The reason the changes are recorded to a journal first is, so the database can use it to replay the operations in case of a crash between checkpoints.

Hope this helps.

Thanks,
Mahi

2 Likes

There was a journal in the cache, too.
I didn’t know.

I could learn more thanks to your very kind reply.
Thank you. :smiley: :+1:

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.