Hey @Ping_Pong ,
Please note that, We recommend sharding if the individual collection size grew more than 2TB. You can refer to Performance Best practice blog for more information.
_id is Primary key MongoDB by default it will be ObjectID. An
ObjectId is a 12-byte hexadecimal value that is likely to be globally unique and guaranteed to be unique per collection when auto-generated. This 12-byte configuration is smaller than a typical universally unique identifier (UUID), which is, typically, 128-bits. Beginning in MongoDB 3.4, an
ObjectId consists of the following values:
- 4-byte value representing the seconds since the Unix epoch,
- 5-byte random value, and
- 3-byte counter, starting with a random value.
The odds of two
ObjectIds being the same would be 1 in 18,446,744,100,000,000,000. We came to this value as there are eight bits in a byte, and eight random bytes in our
ObjectId (5 random + 3 random starting values), making the denominator in our odds ratio 2^(8*8), or 1.84467441x10’^19.
As such, it is possible that there could be duplication of values, but it is highly unlikely.
For more information on MongoDB’s
ObjectId data type I recommend reviewing the following blog posts as they cover this topic in greater detail:
- “Quick Start: BSON Data Types - ObjectId”
- “Generating Globally Unique Identifiers for Use with MongoDB”
Shard key can be a combination of multiple column in the collection which has to be indexed and no need to be unique but it has to be highly cardinal. You can read more about it in shard key documentation.
You can choose _id as shard key but there are complication such as Most recent data moves to single shard, old data will be residing in other shards. You can read selecting shard key blog to choose the best shard key for your sharded cluster.