Lab: Shard a Collection


I imported the file, created the index, enabled Sharding for m103, and sharded the collection products using the sku shard key. but the test fails.

Have I missed something ?

please see the deatils bellow
shard1:PRIMARY> db.products.find().count()

— Sharding Status —
sharding version: {
“_id” : 1,
“minCompatibleVersion” : 5,
“currentVersion” : 6,
“clusterId” : ObjectId(“5eb87031c6dc5365eb87b903”)
{ “_id” : “shard1”, “host” : “shard1/localhost:27001,localhost:27002,localhost:27003”, “state” : 1 }
{ “_id” : “shard2”, “host” : “shard2/localhost:27007,localhost:27008,localhost:27009”, “state” : 1 }
active mongoses:
“4.0.5” : 1
Currently enabled: yes
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
No recent migrations
{ “_id” : “config”, “primary” : “config”, “partitioned” : true }
shard key: { “_id” : 1 }
unique: false
balancing: true
shard1 1
{ “_id” : { “$minKey” : 1 } } -->> { “_id” : { “$maxKey” : 1 } } on : shard1 Timestamp(1, 0)
{ “_id” : “m103”, “primary” : “shard2”, “partitioned” : true, “version” : { “uuid” : UUID(“ece7148c-5aba-4ebe-ba8c-0167e9b4ee72”), “lastMod” : 1 } }
shard key: { “sku” : 1 }
unique: false
balancing: true
shard2 1
{ “sku” : { “$minKey” : 1 } } -->> { “sku” : { “$maxKey” : 1 } } on : shard2 Timestamp(1, 0)


test result

2 total, 0 passed, 1 skipped:
[FAIL] “the dataset is imported to m103.products”
(in test file /tmp/, line 14)
`[[ $data == “imported” ]]’ failed

  • did you import the dataset to ‘m103.products’?
    [SKIP] "m103.products uses the correct shard key "

I noticed also that the data base m103 is created in the shard 2 but empty :frowning:

shard2:PRIMARY> db.products.find().count()

Best regards,

Your products count is not ok
Which file you used for import
Can you show import result?

Here is the script un import result

mongoimport --port 27001 -u m103-admin -p m103-pass --authenticationDatabase admin --db m103 --collection products --file “/dataset/products.json”

2020-05-11T02:41:06.196+0000 connected to: localhost:27001
2020-05-11T02:41:06.208+0000 dropping: m103.products
2020-05-11T02:41:07.989+0000 imported 9966 documents

I am not sure if anything changed in the lab but the count is not correct
What is the size of your products.json
Also are you using correct port for import
What does instructions say?

Hi here is the command that I used. I do not acces to the file to give the size.

Best reards,

mongoimport --port 27001 -u m103-admin -p m103-pass --authenticationDatabase admin --db m103 --collection products --file “/dataset/products.json”

Hi @Farouk_BERROUBA,

Sorry for the delayed response. I missed your post.

The document count is correct.

But the issue is here :point_down:

Why are you importing data on port 27001 ? mongos is not running on this port.

Please go back to the lab and re-read the instructions.

Hope it helps!

Let us know if you have any questions.

~ Shubham

why is sky a good shard key?


Welcome to the Community Forum @Daniel_Martinez,

I can see that no discussion has been attracted to this post yet. We apologize for that.
Moreover, we are not very clear with your questions. Can you kindly be more precise…!!!

Meanwhile, I can assume that you were asking

What makes a good shard key?

The following points make a shard key great:

  1. The distribution of reads and writes
    Just consider, if all of your reads are going to the same replica set, then you’d better hope that your working set fits into RAM on one machine. By splitting reads evenly across all replica sets, you can scale your working set size linearly with a number of shards. You will be utilizing RAM and disks equally across all machines.

  2. The size of the chunks
    MongoDB will split large chunks into smaller ones if, and only if, the shard keys are different. If you have too many documents with the same shard key you end up with big chunks.

  3. The number of shards each query hits
    It’s nice to ensure that most queries hit as few shards as possible. The latency of a query is directly dependent on the latency of the slowest server it hits; so the fewer you hit, the faster queries are likely to run.

I hope it helps you…!!!
Reach out to us with your specific questions, we will be glad to help.

See you in forums.