Question about relate fields of compose hash shard key

Since 4.4 mongodb support compose hashed shard key, but I can not find any document that talk about role of additional fields of the shardkey, for example my shardkey is:

{oemNumber: “hashed”, zipCode: 1, supplierId: 1}

what zipCode and SupplierId fields will do and what do they help for sharding, thanks ! :blush:

Hi @_Jin and welcome to the community.

The following document on Compound Sharded key is useful for understanding the ideas of Sharding on Compound Indexes.

Having multiple fields as part of the compound shard key would be good for a monotonically increasing value where a monotonically increasing shard key could be an issue, since it could mean that your inserts would be directed toward a “hot shard” where one shard is doing all the insertion works and all the other shards not participating in the workload, thus reducing the benefits of using a sharded cluster in the first place

Thus to answer your question, combining a hashed shard key with other keys in a compound shard key would help with the shard key’s cardinality and also could potentially be used to help with range & sorting queries

Please let us know if you have any other questions.

Thanks
Aasawari

2 Likes

@Aasawari Thank you for your response but I still don’t clear much, reason is, for example I have a shardkey like this:

{oemNumber: 'hashed', zipCode: 1, supplierId: 1}

Sample data like:

[
   {
        oemNumber: "ABC",
        zipCode: 12345,
        supplierId: 10
   },
   {
        oemNumber: "ABC", // same oemNumber
        zipCode: 67890,
        supplierId: 15
   }
]

Base on my understand, for above key, the oemNumber field is hashed, like this:

hashFunc("ABC")  // ===> 123456789

Two above records have same value ( “ABC” ), hash function will generate ‘ABC’ to ONE fixed number, ex: 123456789, and due to hashed value are SAME so 2 above records will be put into one chunk in one shard.

In this case, monotonically increasing shard key is an issue, yes, but if we add more fields (zipcode, supplierId) then hashed function also just generate by value ‘ABC’

Because we can just hash one field (oemNumber), so I don’t understand what role of “zipCode” and “supplierId” in hash function, OR are you mean that the hash function will accept ALL 3 parameters like this:

hashFunc(oemNumber, zipCode, supplierId)

If it accepts 3 parameters then why we just can define 1 hashed filed [ oemNumber: ‘hashed’ ] , why we don’t define shardkey like this:

{oemNumber: 'hashed', zipCode: 'hashed', supplierId: 'hashed'}

Please teach me, thank you !

Hi @_Jin

To understand the complete concept of compound hashed shard key, you would need to understand the basic concepts regarding shard key, compound index and hashed shard key.

  1. Compound Index: The compound index key selected as shard key is beneficial only when the cardinality of the key is large and hence lesser issues occur with jumbo chunks and invisible chunks.
  2. Hashed Shard key: This resolves the issues with the monotonically increasing data size which generally results into hot chunks and limits the benefits of sharded collection.
    A shard key must typically divide the collection into chunks which are spread across and maximise the benefits of parallelism. Please find more details on the documentation for Hashed Sharding.

Yes, in the above mentioned case, it will put into one chunk and hence it is recommended to use a shard key with maximum cardinality value i.e which has more number of distinct values.

If the oemNumber is not monotonically increasing and does not have a good cardinality, hashing the field would not be a right.

Also, hashing for multiple fields for a compound shard key is not possible as of today.

If you wish to know more on concepts of sharding and shard keys, please visit our University course on M103: Basic Cluster Administration | MongoDB University.

Let us know if you have any further questions.

Thanks
Aasawari

1 Like

Thank you for responding @Aasawari

{oemNumber: ‘hashed’, zipCode: 1, supplierId: 1}

so that means that zipCode and supplierId fields in above shard key have no meaning for sharding right ? they are just used in compose index. Do I understand it correct ?

Only for sharding feature, 2 bellow shardkeys are the same right?

{oemNumber: 'hashed', zipCode: 1, supplierId: 1}

and

{oemNumber: 'hashed'}

Hi @_Jin

No. As I previously mentioned, a hashed compound shard key is a combination of compound index, shard key, and hashed index. It combines all the concepts of those three things. Please refer to the documentations attached for the same.

No, they are not the same, the former defines the compound hashed shard key and later is hashed shard key.

I would reiterate my earlier suggestion about enrolling to the M103 MongoDB University Course. I believe it would be very helpful in your MongoDB journey :slight_smile:

Thanks
Aasawari

many thanks @Aasawari
I will read more

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.