Using mongodb as a key-value store which fetch multiple keys together

We are investigating if we can use mongodb as a cache where we fetch 100 keys everytime

Storage:
We have 1000 customers and upto 100000 of keys for each customer where values are 5KB JSON. We are expecting total size of data to be around 200GB.

Access pattern:
In each request, we fetch 100 keys together (but all of them belong to same customer)

Question 1:
If we use Hash(_id) as shard key, each request will need mongos router aggregating the data from multiple shards. Is that ok?
Is mongos router efficient when I use $in clause with multiple _ids which belong in multiple physical nodes when sharded.

Question 2:
Is there a pattern of sharding which makes access more efficient?

if it was redis, I could have use customerId has cluster hash tag so I can use MGET to fetch multiple keys together
if it was cassandra, I could make (customer_id, key) as primary key with key portion as sort key to ensure queries go to same node for efficient retrievalI

I am new to MongoDB. Say, if I am sharding by using customerId here, wondering if that is optimal as it can lead to large chunks (which I read somewhere in docs that it is bad)

1 Like

Hi @Hasan_Kumar

Welcome to MongoDB Community.

First I am not certain that 200GB of data is worth having a sharded cluster, why do you expect having a sharded cluster?

Now when looking into the use case if you need to query 100 keys per customer together why not having them in the same document, for example:

{
customerId : xxxxx,
keys : [
   { "k" : "key1" , "v" : "value1"},
...
 { "k" : "key100" , "v" : "value100"}
}
}

Will that work for you? Than you can index the custmerId and query one document which will be the best performance to fetch the 100 keys.

If you need to update the keys or query by a specific key you can use the attribute pattern indexing { "keys.k" : 1, "keys.v" : 1}

For updates please look at array filters:

In case you will need to shard this collection you may consider has sharding by “customerId” but than each customerId fetch will target a specific shard.

I suggest to read the following blogs:

Best regards,
Pavel

1 Like

Thanks @Pavel_Duchovny

Thanks for the input that sharding may not be needed for 200GB data :+1:

Question: Each customer can have upto 100K keys with each value upto 5KB. If i am storing all keys for a single customer in one document, that will mean a single document of 500MB document. Isnt that a problem?

But while accessing we need only 100 of those keys (assume filtered pagination).

@Hasan_Kumar,

Limit the number of keys per document to be 100 only and bucket them into 1000 documents resulting in overall 100k keys (100 X 1000 docs).

So each customer will have 1000 documents in a collection which is totally fine.

Will that work?

Thanks
Pavel

I think the answer depends on whether you are fetching specific 100 keys always or arbitrary 100 keys (for a particular customer). Can you tell us more about the use case? Are you adding new keys for each customer over time? Are you getting most recent 100 keys or using some other way to choose them?

You mention customerId as a cluster hash tag, in MongoDB you can have a secondary index on any field or combination of fields (like on {customerId:1, keyId:1} for instance. Is there a reason you’re looking to store things as key-values rather than using full power of documents?

Asya

Thanks @Asya_Kamsky & @Pavel_Duchovny
Each customer (organization) has upto100000 tasks (created by different users) created over a period of few months.

I need 100 arbitrary keys everytime so storing them in multiple documents may not be ideal as I don’t know before hand which document will it be in.
I intend to store documents as keys but given the current access pattern, I will need the full document everytime (hence calling it a key-value store).
e.g., give me documents with ids 1, 23, 56, 799, …, 100212 (all belonging to same customer who owns 100000 such other documents). (Note: 100 ids to fetched are not completely random and are determined by a query to our posgres database)

Also what is the max recommended document size?

Hi @Hasan_Kumar,

The document limit is 16MB , while you potentially can have documents near that size its not really recommend due to the risk of hitting the limit and moving 16mb over network to the client per document will need an extreme justification…

As @Asya_Kamsky mentioned why not storing keys data clustered by customer and keyId :

{
customerId : ...,
keyId : ...,
Value : .... ,
... 
}

Index : {customerId : 1, keyId : 1}
You can than potentially query all documents for a customer per set of key ids:

Coll.find({customerId : "xxx", keyId : { $in : [ 1 , 23 ... ]})

Thanks
Pavel

Thank you! Will try the suggestions here.

1 Like