MongoDB for complex filter queries?

I’m bootstrapping a Saas where I’ve got several customers, each having their own set of “end_customers”. On average, a customer has around 3,000 to 5,000 end customers, but this could range anywhere from 1,000 up to 300k.

Each customer can define up to 30 custom fields for their end customers, with a mix of field types including text, number, date, or a constrained text value. All of this data is currently stored in a MariaDB database, with a main end_customer table and a end_customer_field table for the custom fields. However, I’ve run into some serious performance issues when trying to allow my customers to execute dynamic filter queries on their end customers.

These filter queries involve joining the end_customer table with the end_customer_field table up to 30 times, and converting text values to numbers or dates for certain data types. I’ve also implemented a variety of logical comparison operators based on the data type. Unfortunately, more complex filters are currently taking several minutes to run on my local system before timing out, so I urgently need to find a solution to this problem.

I’ve been exploring a few potential approaches to improve performance, including limiting the number of search criteria, restructuring the data storage, and even considering a switch to a different type of database. For instance, I could change the structure of the end_customer_field table to have multiple columns for different data types, or I could split it into separate tables per data type. This would complicate the join/lookup logic, but could avoid the need for casting.

Another option I’m considering is a switch to a NoSQL database, such as MongoDB. I haven’t used NoSQL databases before, but I’ve heard they could potentially handle this type of data and querying more efficiently. However, I’m not exactly sure how to go about this, or whether it would really solve my performance issues.

I’m reaching out to the community here to see if anyone has any insights or suggestions. Would a NoSQL database be a good fit for this use case? How could I structure my data in MongoDB (or another NoSQL database) to handle these dynamic filter queries efficiently?

Any guidance or advice would be greatly appreciated.

Hello @Marcus_Biel ,

Welcome to The MongoDB Community Forums! :wave:

I saw that you haven’t had a response to this topic yet, were you able to find guidance for this?
If not, then can you please confirm if my understanding for your use-case is correct?

As you mentioned that your application is supposed to handle large number of unstructured data, MongoDB may be a good fit for such solutions. For guidance on schema design patterns, please refer
Building with Patterns: A Summary. I think Attribute Pattern along with Extended Reference pattern could be a good fit for your use case. As one will provide the support required for freeform fields and latter will help with minimising joins.

Attribute pattern example
If our data collection was on bottles of water, our attributes might look something like:

"specs": [
    { k: "volume", v: "500", u: "ml" },
    { k: "volume", v: "12", u: "ounces" }
]

Here we break the information out into keys and values, “k” and “v”, and add in a third field, “u” which allows for the units of measure to be stored separately.

{"specks.k": 1, "specs.v": 1, "specs.u": 1}

It provides for easier indexing the documents, targeting many similar fields per document. By moving this subset of data into a key-value sub-document, we can use non-deterministic field names, add additional qualifiers to the information, and more clearly state the relationship of the original field and value. When we use the Attribute Pattern, we need fewer indexes, our queries become simpler to write, and our queries become faster.

Extended Reference Pattern example
In an e-commerce application, the idea of an order exists, as does a customer, and inventory. They are separate logical entities.

Instead of embedding all of the information or including a reference to JOIN the information, we only embed those fields of the highest priority and most frequently accessed, such as name and address.

By identifying fields on the lookup side and bringing those frequently accessed fields into the main document, performance is improved. This is achieved through faster reads and a reduction in the overall number of JOINs. Be aware, however, that data duplication is a side effect of this schema design pattern.

Personally, I will test a mixture of different patterns first which could help me with my requirements and help provide an easy structure to work along my application. After deciding on the design pattern, then comes performance enhancement, which can be done by ways such as extending the design horizontally using sharding.

Regards,
Tarun