Cannot get document from some shard when execute find() command in sharded cluster

tayfun_yalcinkaya · December 8, 2021, 7:53pm

Hi;
We have a sharded cluster which has 3 shards and each shard has Primary, Secondary and Arbiter. We are collecting data from oplog of each shard in 3 different java code blocks concorrently. After that we are parsing oplog data to get unique document id. There are no any problems until here.

After getting unique document id from parsed oplog data, we are executing find() command in 3 different java code blocks. These codes return current document that has queried id. While one shard responses with documents, the others cannot find any document that has queried id.

As a summary each shard has oplog reader and find() command executor. Why we cannot get found document from two shards of 3. Does Mongo locks two of them? We want to know Mongo’s behaivour.

We give Primary IP s of each shard to connection string when execution of find() command. We also tried to give Mongos IP address to connection string, we were continue to get null document as very slowly. You can find below the code blocks that include find() command execution.

Used connecton String:

private void createConnection(String mongoUsername, String mongoPassword,String authDB, String MongoDBName, String mongoConnectionURL, String mongoCollectionName) {

MongoClientURI uri= new MongoClientURI("mongodb://"+mongoUsername+":"+mongoPassword+"@"+mongoConnectionURL+"/"+authDB+"?&authMechanism=SCRAM-SHA-1");
MongoClient mongoClient = new MongoClient(uri);
MongoDatabase database = mongoClient.getDatabase(MongoDBName);
path_collection = database.getCollection(mongoCollectionName);}

Find command execution:

public Document  findNode(String binarydata, String operationType) {
byte[] binary_byte = new byte[0];
binary_byte = binarydata.getBytes();
BasicDBObject whereQuery = new BasicDBObject();
whereQuery.put("_id", new Binary((byte) 3, Base64.getDecoder().decode(binarydata)));
FindIterable<Document> mongo_node = path_collection.find(whereQuery);
MongoCursor<Document> iter = mongo_node.iterator();
System.out.println("iter" + iter);
Document doc = null;
doc = iter.next();
return doc;

Thank you.

Prasad_Saya · December 9, 2021, 5:26am

Hello @tayfun_yalcinkaya, welcome to the forum!

Sharding is to distribute data among shards. This is so for sharded collections. Note that a sharded cluster can have sharded as well as un-sharded collections.

If your collection is not sharded - all the collection data is on the Primary Shard. And, a sharded collection has data distributed on multiple shards.

To connect and query data from any of collections in a sharded database - you connect via the mongos router and it’s connection uri.

If you are querying (using find method), a unsharded collection, the query is directed to the primary shard. If your query is trying to find the data in a sharded collection, then the query may access one or more shards to get the data - depending upon the query filter.

For a sharded collection, if the query filter includes the shard key, then your query is likely to be targeted query (this is an efficient operation). Otherwise its going to be a broadcast (or scatter-gather) operation (and a very slow query).

This is the common way of querying data from the sharded cluster. You do not access or query any of the shards directly in a sharded cluster from an application (in your case the Java program).

tayfun_yalcinkaya · December 9, 2021, 1:37pm

Thank you for your answer @Prasad_Saya;

When we give mongos IPs to connection string, we have performance issues as you mentioned. Can you give an example about find() command with shard key ? Normally in our code we have executed find() command without shard key. I want to see the syntax of find() command with shard key. Could you please provide an example?

Thank you, Tayfun

Prasad_Saya · December 9, 2021, 1:47pm

Hello @tayfun_yalcinkaya, first of all you need to tell if your collection is sharded? If your collection is sharded, then you should know what the shard key is (it is (a) document field(s) in the sharded collection).

The syntax of find is not different from that of the find operation on a un-sharded (or “normal”) collection. Just that, you will be using the shard key field within the query filter.

tayfun_yalcinkaya · December 10, 2021, 12:07pm

Hi @Prasad_Saya , thank you for your answer, I understand that the find() query has shard key field of document. I have paste the shard status output below. Each shard key looks like the same, is it possible? Should Mongo side has been changed the shard keys?

— Sharding Status —
sharding version: {
“_id” : 1,
“minCompatibleVersion” : 5,
“currentVersion” : 6,
“clusterId” : ObjectId(“5c656ed503fc77cf107abcf8”)
}
shards:
{ “_id” : “vnsh1”, “host” : “vnsh1/ X.X.X.X:27018,X.X.X.X:27018”, “state” : 1 }
{ “_id” : “vnsh2”, “host” : “vnsh2/ X.X.X.X:27018, X.X.X.X:27018”, “state” : 1 }
{ “_id” : “vnsh3”, “host” : “vnsh3/ X.X.X.X:27018, X.X.X.X:27018”, “state” : 1 }
active mongoses:
“4.2.12” : 2
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
123 : Success
databases:
{ “_id” : “config”, “primary” : “config”, “partitioned” : true }
config.system.sessions
shard key: { “_id” : 1 }
unique: false
balancing: true
chunks:
vnsh1 342
vnsh2 341
vnsh3 341
too many chunks to print, use verbose if you want to force print
{ “_id” : “order”, “primary” : “vnsh3”, “partitioned” : true, “version” : { “uuid” : UUID(“47dd7d3b-7a57-45fc-8519-3490d6c7827c”), “lastMod” : 1 } }
order.orderItemStatus
shard key: { “_id” : 1 }
unique: false
balancing: true
chunks:
vnsh1 354
vnsh2 354
vnsh3 353
too many chunks to print, use verbose if you want to force print
order.orderItems
shard key: { “_id” : 1 }
unique: false
balancing: true
chunks:
vnsh1 1167
vnsh2 1167
vnsh3 1166
too many chunks to print, use verbose if you want to force print
order.orderRequest
shard key: { “_id” : 1 }
unique: false
balancing: true
chunks:
vnsh1 949
vnsh2 949
vnsh3 948
too many chunks to print, use verbose if you want to force print

Thank you, Tayfun.

Prasad_Saya · December 10, 2021, 12:33pm

Hello Tayfun,

Please follow the documentation on sh.status to interpret the status output (its fields, their meaning, etc.). The Output Examples and Output Fields sections has details.

From your sharding status output, it looks like you have:

A sharded database called orders.
Three sharded collections orderRequest, orderItems and orderItemStatus.
Each of the 3 collections have the shard key as _id; it is the shard key definition (shard key: { “_id” : 1 }).