I have following two questions about transactions on sharding…, supposed we have a sharding cluster consists of 10 shards which has the MongoDB 4.2 software running, now we have a multi-documents transaction which has 7 shards involved
(1) How the coordinator being selected ? (please answer a little bit details)
(2) After the coordinator being selected, how it knows which other shards are involved the
transaction? and then check them the commit/abord status? (as I knew, shards have no knowledge of data distribution maps).
MongoDB mongos instances route queries and write operations to shards in a sharded cluster. mongos provide the only interface to a sharded cluster from the perspective of applications. Applications never connect or communicate directly with the shards.
The mongos tracks what data is on which shard by caching the metadata from the config servers. The mongosuses the metadata to route operations from applications and clients to the mongod instances. A mongos has no persistent state and consumes minimal system resources.
So queries and writes get sent to the mongos and it will decide which shards to involve. It does this base on if the shard key is included in the query.
Write Operations on Sharded Clusters
For sharded collections in a sharded cluster, the mongos directs write operations from applications to the shards that are responsible for the specific portion of the data set. The mongos uses the cluster metadata from theconfig database to route the write operation to the appropriate shards.
MongoDB partitions data in a sharded collection into ranges based on the values of the shard key. Then, MongoDB distributes these chunks to shards. The shard key determines the distribution of chunks to shards. This can affect the performance of write operations in the cluster.
Update operations that affect a single document must include the shard key or the _id field. Updates that affect multiple documents are more efficient in some situations if they have the shard key, but can be broadcast to all shards.
So from the docs that I have shared and your situation of 10 shards with 7 being effected by the update…
The mongos checks the metadata in the config servers for this information. There is a collection stored in the configServer known as chunks which will display the chunk range inclusive min and exclusive man.
Write concern describes the level of acknowledgment requested from MongoDB for write operations to a standalone mongod or to replica sets or to sharded clusters. In sharded clusters, mongos instances will pass the write concern on to the shards.
In MongoDB, an operation on a single document is atomic. Because you can use embedded documents and arrays to capture relationships between data in a single document structure instead of normalizing across multiple documents and collections, this single-document atomicity obviates the need for multi-document transactions for many practical use cases
Have you taken the M103 Basic of Cluster Adminastration? This course will help clear up some of your questions. The latest course started On Aug 20 but you can still sign up until Monday.
After being through the line up of courses, I was most impressed by M103 and M121! Anything you forgot I am sure will come back just like riding a bike! And if not the forum community is always here to help support each other!
Truthfully all these courses are fantastic. And Free!
I should say “coordinator” shard instead of “coordinator”… the word was from Aly Cabral’s presentation on MongoDB World 2019, I was there and listened full talk, she made an example to explain how transaction being commit or aborted by using two involved shards, one called “coordinator shard” and it will check the commit/abort status of other involved shards and issued acknowledge finally, she didn’t mention how “coordinator” being selected and how it has knowledge which shards involved the transaction. this is my questions came from,
(I couldn’t find the record of her presentation in YouTube and I maybe wrong, if you have chance review the record of her presentation, please send me the uri)
Thanks for providing the links…
Now I understood how the coordinator being selected,
Back to my second question, from the video, near 1:46, Eliot said “The client is going to say commit, coordinator’s going to say prepare, get ready. All the shards are going to be we are ready, let’s go, the coordinator says, commit, we are done”.
In case, the coordinator needs to know the shards in the cluster involved the transaction, then it can communicate them to prepare to trigger the commit, how coordinator knew which shards involved?
So Esha explains that the Coordinator shard will write to it Oplog an entry of the shards involved in the transaction. How it knows which shards are involved? I honestly do not know for sure (someone more experienced can help me out) however I would guess the same way the mongos knows which shard to query from or write to, by getting that info from the Config servers which store data about the Shards in the cluster
If you go through, or have already been through, the Course mentioned by @juliettet (M103), it talks about how replication is done through the Oplog
MongoDB applies database operations on the primary and then records the operations on the primary’s oplog
And after watching that video it seem that is how the shards are to ‘communicate’ when dealing with distributed transactions. And the only place information like what your asking about