Indexes on sharded collection

Nithin_Kumar · March 17, 2022, 9:54am

Our collections are sharded based on tenantId.

Is it good practice to create another single index on a particular field instead of creating compound index containing tenantId and that particular field

Pavel_Duchovny · March 17, 2022, 12:40pm

Hi @Nithin_Kumar ,

Well it depands on the nature of the queries and what is the optimal index to support them.

However, if you don’t include the tenantId prefix in your queries and your compound index prefix you will probably end up with a scatter gather (distrebuted) query that will essentially send the query to all of the shards and wait until all of them respond :

https://docs.mongodb.com/manual/core/distributed-queries/#read-operations-to-sharded-clusters

This can be a performance problem and you should avoid it if possible…

Thanks
Pavel

Nithin_Kumar · March 17, 2022, 3:58pm

iterable = db.collection.find(’{field:value}’, limit;1000);
iterable.foreach{ doc →
doc.update( ‘{field:differentValue}’
}
I did a perf test on the query for 500K documents in a 2 sharded cluster.
time to fetch the records every 1000 documents is ~1second.

Pavel_Duchovny · March 18, 2022, 8:17am

@Nithin_Kumar ,

That doesn’t sound very good…

Wat are the indexes and shard keys?

Nithin_Kumar · March 18, 2022, 8:21am

_id : { _tenantId: “blah”, _id: }
fieldOfInterest: someVal

shardkey: [ _id._tenantId ]
index: [ _id._tenantId, _id._id ]
index:[ fieldOfInterest ]

Pavel_Duchovny · March 18, 2022, 11:05am

@Nithin_Kumar ,

I don’t think I fully understand. You have an _id that is not an objectId but a complex object?

{ _id : { __tenantId : ... , _id : ... } ...

Is that correct?

Now when you query on this non sharding field the query is routed and gathered through all shards on a mongos… Therefore not super perfomant…

You can’t see a logic of using a shard key prefix before this query :

db.collection.find ({_id.__tenantid : "blah", valueOfIntrest : ... })

Index : {_id.__tenantId : 1 , valueOfIntrest: 1}

Ty

Nithin_Kumar · March 18, 2022, 11:24am

yes the _id is complex object.

This type of query is not done often.
iterable = db.collection.find(’{fieldOfInterest:value}’, limit;1000);

so probably we will live with it. No, we dont have a way to get all the tenantIds.