Difference between the way we create compund hashed index

I’m trying to create a compound hashed index for hash based sharding. I want to know if would there be any difference between the below indexes:

  1. { "id1" : "hashed", "id2" : 1 }
  2. { "id1" : 1, "id2" : "hashed" }

they are different indexes, either one can be used as a hashed sharding index.

Thanks for the response @Kobe_W. I’ve a follow up question.

Lets say id1 is randomly generated fixed length number that takes timestamp as seed. And I’ve another field id2 which is an enum and has a limited number of values, around 10. I’ve mandatory queries involved on filed id2, and quite frequent queries on id1 as well. Now which is a better shard key in this case? Could you please suggest.

Now the question is to select a shard key pattern so that you can get the most benefit.

To answer that you will need to get a rough understanding on the traffic for different query patterns and see which one you sacrifice.

For instance, if you use id1 as shard key, then your “frequent” queries will be very easy (they are targeting queries), but then your “mandatory” queries will have to be broadcast (assume they don’t also include id1 values).

Another option is to use a compound key for both id1 and id2, but in that case your queries need to include both fields in order to be targeting queries.

Yes I wanted to use a compound key. All my queries have a mandatory filter on id2 field. And majority (more than 80%) of queries include a filter on id1 along with the mandatory id2 field.

I believe shard key on id2 field alone won’t be useful as cardinality of it is just 10, and again shard key on id1 alone wouldn’t give me a benefit as all my queries have a mandatory filter on id2 field (please note that id1 is like a uuid and id2 is a enum). So which of the following would be a better shard key in such case?

{id1: hashed, id2: 1}
{id1: 1, id2: hashed}
{id2: hashed, id1: 1}
{id2: 1, id1: hashed}

there’s no need to use a hash index for id2.

my answer would be : either {id1: hashed, id2: 1} or {id2: 1, id1: hashed} should work for you. (unless there are some tricks that are applied internally for building a more efficient index tree).

Most your queries include both fields (as equality check), so i don’t think the ordering matters a lot.

for other queries, they have to be broadcast.

another way is to simply use hashed id1 as the shard key. Given id1 is UUID. i believe it’s no big difference from {id1:hashed + id2}

another thing to consider is that a hashed index doesn’t support a good/targeted range scan.

Right, those 2 shard keys makes sense.

And I think {id2: 1, id1: hashed} would be a better shard key, as id2 can be treated as a index prefix and all my queries that has a compulsory filter on field id2 would use it. What do you think?

that is fair. but given id2 has only 10 values, the benefit is not that big.

This topic was automatically closed after 180 days. New replies are no longer allowed.