Availibility of non-sharded data

Hi mongodb guru out there:

I have a scenerio related to tag sharding that need some assistance. Or maybe there are some better ways to handle what we wants.

For example, our mongo nodes configuration has following setup:

Data nodes:

US replica set: NODE_US_A, NODE_US_B, NODE_US_C

EU replica set: NODE_EU_A, NODE_EU_B, NODE_EU_C

Asia replica set: NODE_ASIA_A, NODE_ASIA_B, NODE_ASIA_C

Config Server nodes (replica set):

US: CS_US
EU: CS_EU
Asia: CS_ASIA

Application access to mongodb data is through mongos in the local region.

Because of the geographally seperation, network latency sometimes is high. If we replicated the whole database (over 750GB) between the continents, it will be impossible. Therefore we think of using a tag sharding setup to distribute data specific to each region to stay in its region. However, there are some common collections that all 3 locations need to share.

Example collections:

  • company (common data)
  • market (common data)
  • customer (local region only)

I used the tag sharding example provided in the mongodb documentation on our test environment. Tag sharding seems to be able to do what I want. customer records are sharded into different
shards US, EU, and Asia. Only Asia customer records are available in Asia replicaset.

But what I cannot figure out is how to handle collections like “market” & “company”. We want these collections to be available in all regions.

How can we get these type of common tables to be available in all regions? From what I understand, sharding means split the data based on shard keys. If they are sharded, some data will
not be available in all regions.

The reason I want the common data collections to be available everywhere is, in case of high network latency between the geo regions, when a local mongos talks to local mongo config server to query common data collections. If the “company” collection is primary in US, and network latency between Asia and US so too high that proper mongo communication is not fesible, then “company” will not be available in Asia as there is no “company” collection in the Asia shard.

Therefore, I want to see if you guys have any advice on this.

Thanks in advance.

Hello @Eric_Wong, just couple of comments here.

But what I cannot figure out is how to handle collections like “market” & “company”. We want these collections to be available in all regions.

How can we get these type of common tables to be available in all regions? From what I understand, sharding means split the data based on shard keys. If they are sharded, some data will not be available in all regions.

The non-sharded collection data will reside on the Primary shard only (and, the primary shard is going to be local to one of the regions, in your setup). This is because the unsharded collections data is not distributed among the shards. This data will be available to all regions (US, EU and Asia). And, your queries related to these collections will hit the primary shard.