The Journey of 100DaysofCode aka 100DaysofMongoDB (@Aasawari_24)

Day19 as #100DaysofMongoDB as #100daysofcode

Extending the concepts of WiredTiger in here:

DataHandle and Btrees
The Datahandle also known as dhandle represents the B trees. This are created when a collection is created and destroyed when no longer in use.
This contains the following information:

  1. Name
  2. References to the global data list
  3. Statistical data
  4. Type of underlying data object.

The lifecycle of dhandle comprise of three stages:

  1. dhandle creation: When a cursor attempts to access the table, it tries to look for the dhandle in the cache first and then into the global list. If not present, it creates a dhandle and puts them to the global list with its reference in the cache.
    The reading for dhandle and writing a new dhandle, requires the read and write lock to the global list respectively.

For the creating of dhandle, two counter values are created:

  • session_ref which counts the number of session dhandle cache lists contain dhandle.
  • session_inuse: counts the open and operating of dhandle.
  1. dhandle cache sweep: The dhandle that have not been in use for a longer period are removed.

  2. sweep server:
    If the session_ref counts to 0, comparison of configured times with current time is calculated and are marked then as dead and the resources are released.
    However. if the value is not 0 and the dhandle is not referenced by any session, the servers removes from the global list and frees the remaining resources.

Eviction:
This is the process of removing old data from the cache. It uses a dedicated set of eviction threads that are tasked. This cannot be triggered by APIs.

File System/ Operating System Interface:
An abstraction layer allowing main line WiredTiger code to make call to interface.
History Store:
This has old version of records and used to service long running transactions.
Logging:
This are write-ahead-log when configured. The sole purpose is to retain the changes made after the last checkpoints and helps in recovery in case of crash.
There are three log related files created:

  1. WiredTigerLog.* has 10 digit postfix vales for every log file created.
  2. WiredTigerTmpLog: contains the header content and once they have all data synced to disk, this is renamed to WiredTigerPrepLog

MetaData:
This is a key value pair with key as the uri string and value as the configuration string which contains other key values pairs describing data encoding for uri.

RawStores:
This are B trees without the record id

Schema:
This defines the format of application data

Snapshots:
They are implemented by storing set of transactions id committed before transaction started.

Rollback:
This has modifications which are stable according to stable timestamp and recovered checkpoint snapshots.
This scans all tables except the metadata.
This involves three phrases:

  • WT startup
  • WT shutdown
  • Application initiated

The prerequisites for a rollback is that there should NOT be any transaction activity happening in the WiredTiger.
The checks performed includes:

  • Table is modified
  • The checkpoint durable start/stop timestamp is greater than the rollback timestamp.
  • There is no durable timestamp in any checkpoint.
  • Has prepared updates
  • Has updates from transactions greater than checkpoint snapshot (only in restart phase)

This describes the WiredTiger from the architectural perspective.

Thanks
Aasawari

Share on twitter: https://twitter.com/Aasawari_24

2 Likes