Mongo model for dynamic fields

Prasad_Kini · December 28, 2022, 3:48pm

Hello,
I am very new to Mongo and I am trying to get an understanding on best practices for modeling the data.
These are my requirements:

The application will be used by multiple clients
Clients decide the fields that they want to have at runtime. There is no limit to the number of fields a client can have
The fields can be of any data type i.e. int, datetime, string, list, multilist etc
4. The model has to support sorting, grouping, aggregates oy the dynamic fields

I have tried out a couple of models so far.

Create the dynamic fields in the mongo collection as and when needed. In this case, the same collection will have different fields for different clients. e.g. below

{
	client_id: 123,
	product_id: 456,
	category: "Home",
	product_name: "Television",
	listing_date: "2022-12-28",
	quantity: 97
}

Save the dynamic fields as key value pairs. e.g. below

{
	client_id: 123,
	product_id: 456,
	fields:[
		{
			key: "category",
			value: "Home"
		},
		{
			key: "product_name",
			value: "Television"
		},
		{
			key: "listing_date",
			value: "2022-12-28"
		},
		{
			key: "quantity",
			value: 97
		}
	]
}

Are there any other options for modeling such data?
Please note that the number of documents in the collection per client will run into 100s of thousands and the sorting, grouping etc need to be highly performant.

Thanks in advance for the guidance!

Yilmaz_Durmaz · December 28, 2022, 4:11pm

What you describe is polymorphic behavior and is well suited for document databases, and MongoDB. as long as you have a discriminator for documents ( this belongs to client A ), and have few static fields (at least an “_id” and “version/owner”) you can store any kind of document in a single collection. you can even add relations inside the same collection with different keys.

enforce some fields to be of a specific type for consistency (like “_id” to be ObjectID and unique)
and add a versioning field specific to your clients, then require this field in every query.

the downside will be creating indexes as each client would require a different kind of fields to index, but that is a story for another time.

Prasad_Kini · December 28, 2022, 4:44pm

Hi Yilmaz,
Thanks for the quick response!

Glad to know that Mongo will be able to meet my requirements.

My understanding is that each collection cannot have more than 64 indexes. We want to build a generic solution that’d avoid having to create unique indexes per client. Moreover, having different indexes for each client would mean that I would have to build some module that’d create them at runtime based on some data analytics or some other logic.

Which of the options that I have listed will work best for the expected functionality? Is there a third option?

Thanks again!

Yilmaz_Durmaz · December 28, 2022, 6:46pm

let me ask you a few important questions before trying any further:

do you own the host pc (physical or cloud) where MongoDB is installed, or do you use Atlas?
are you yourself own the MongoDB server or just a user with access to only your database/collections?
what do you exactly mean by “clients”? are they some IoT devices with different purposes, or real people/companies you want to serve?

because there is a border where you need to decide between using a database and implementing whole database servicing.

Honestly, my opinion is to implement an administration interface (if it is not Atlas), then interact with the database to create users, create databases and assign them to users, set other security measures, and let users create their collections and fill documents as as they like, and also let them create their own indexes since they own the database, and charge them for their usage (disk, cpu, indexing etc.). This is similar to what Atlas does for shared clusters. (you have more control and resources on private paid clusters)

By the way, since you said being new, here is an important recap:

document is a BSON/JSON/eJSON object (depending on where you are looking at it)
collection is where you keep similar documents
database is where you keep collections and set user access
server is where you keep databases, users, logs, security etc.

Databases and Collections — MongoDB Manual

your design does actually belong to the database and server layer. so trying to give a solid answer depends on your own resources

Prasad_Kini · December 29, 2022, 12:22am

Hi Yilmaz,
Answers to your questions below:

The db will be hosted on Atlas
We will own the cluster on Atlas
The “clients” are actual customers from pharma, automotive, retail and other industries
The clients will use a web application for their needs. They would not have access to the db at all
They will use the web application to register on the application and run through a configuration wizard that’d create the fields for them
We are responsible for providing all functionality in the application and also ensuring that the performance is at par with other web applications

Hope this answers your questions. Thanks again for indulging me

Regards,
Prasad

Yilmaz_Durmaz · December 29, 2022, 8:41am

Please bear with me,

I forgot to mention the 5th layer that has a name easily confused: server itself. I mean the host pc so there is a real/virtual host server pc and there are database server programs. we can have only 1, or as many as we need, mongod (the program) instance on a single host pc. the same holds true if zoom out our sight: a data center can have a single powerful data hosting pc, or as many as needed.

The reason I raise this ordered architecture is about making decisions about where exactly our single document should belong. If your design to keep client data in a single collection proves to be hard to implement, you need to consider having a collection for each user which will eliminate the indexing problem. If a single client’s data exceeds a certain amount, you need to consider having a database (not the server, naming can also be confusing here). And If you want to serve a bigger degree of data, you get the idea, go one layer up.

In all of these possibilities, if you design carefully, your clients would not notice any difference if you choose one or another. they would not even notice if you change to some other database other SQL/NoSQL server other than MongoDB. in fact, you can leverage cooperation between them to cover their weak sides. None of these would be noticable by clients if your design is good.

Now back to your “document” focused design. If you try an all-free approach, the indexing will become bloated and thus performance will degrade. I think you need to pour some thinking into field names and types to guide your clients. for example, try preventing them to name the field “my_age” and enter their pet’s name (a string) as the value.

The second approach from your first post has an advantage over the other: you can have a “search index” over the key field. but you still have to deal with the array structure.

“Third option” as you asked, along with the first and second, need a longer time to discuss than we have here because this will be the heart of your design. Considering “100s of thousands” documents per client, I would go up on the layers and settle on a collection or database per client. think of it as folder tree structure; a folder per client. you are required to implement functions to switch between “folders” for each client, but the logic you end up with for the “document” does not need to change, and it will be much easier to administer each client plus better performance.

By the way, I am sorry for the long lines to read. Model designing sounds like an easy thing, but might be the hardest because there are too many things to consider. But again, as long as you keep backup data, you can craft a whole new model and apply it without clients ever noticing. So, decide on one model and start developing so you can actually test things on the way if you prefer hands-on experience.

Prasad_Kini · December 29, 2022, 4:57pm

Hi Yilmaz,
Thank you so much for your pointers. This is a new db that I am looking to migrate to from SQL server and I want the foundation to be laid right so that we don’t run into fundamental issues later.

It seems like at this point option 2 (key value pairs) and 3 (different collection/client) are the only viable options so far. Seems like I have a lot of thinking to do.

Does option 3 require any design considerations on Atlas right away? Would it cause space/memory requirements to grow faster than option 2?

Thanks again!

Yilmaz_Durmaz · December 29, 2022, 11:16pm

you may want to check this post before moving on:
Creating a new database vs a new collection vs a new cluster - Ops and Admin / Installation & Upgrades - MongoDB Developer Community Forums

collection per client will give flexibility for indexes plus queries will be faster as you will be searching only on that client’s collection. Yet there is a limit (about 10000) on the total collections you can have in one database. so if the number of clients you will have may go above that, consider the next layer.

database per client will give more power if you want to allow clients to have still-relational data as the tables in an xyzSQL server corresponds to the collections in MongoDB.

database is the highest level in a mongod instance, and instead of scaling database your collection resides, you can create new clusters if the number of clients starts increasing. you can even group client types into clusters.

things to consider can really be overwhelming. but again, take your time. as long as you keep a backup, feel free to experiment with ideas.

Prasad_Kini · December 30, 2022, 12:42am

Hi Yilmaz, thanks for sharing the post on db vs collection vs cluster. I will go through it carefully and let you know if I need further help.

Thank you so much for all your inputs!

system · January 6, 2023, 12:08am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.