I am trying to decide on a database which would fit for trillion image metadata per year dynamic schema? IS MongoDB the right data base

Use Case -

Metadata Schema below needs to be matched -

https://www.openmicroscopy.org/XMLschemas/OME/FC/ome.xsd

Hi @Rounak_Joshi,

Welcome to MongoDB community!

You should basically ask yourself why not MongoDB?

As it is a:

  • Flexible schema : semi and dynamic.
  • Designed for big data volumes with built in HA and scale out solutions (Replication and Sharding).
  • Fully manageable in cloud or On-Prem.

Having said that designing your schema is key for scalability down the road.

I suggest that if you don’t have much experience with MongoDB you first:

Additionally please read the following:
https://www.mongodb.com/article/mongodb-schema-design-best-practices

All articles: https://www.mongodb.com/blog/post/performance-best-practices-benchmarking

https://www.mongodb.com/article/schema-design-anti-pattern-summary

Thanks
Pavel

1 Like

Greetings Pavel , thanks for sending all this information over…Only thing I would like to insist is that is this trillion image metadata…data itself is not even a scope right now, which I am sure will come later. As mentioned, the OME schema for metadata is the more relevant to this use case.
Having said that let me go over all the links you sent me and that will enable me to determine the best database schema fit specific to this type of a use case.

Also, this is all on-prem as this is based on image microscopic metadata based on experiments performed in research labs of health institutes

If I may add to @Pavel_Duchovny ideas I will recommend that you took some courses from https://university.mongodb.com. Some of them are low in terms of invested time but gives you a real good idea of what you can do with MongoDB.

In particular the courses, M001, M100, M121 and M320 are suitable to have a good idea of the capabilities.

Thanks @steevej for sending this additional information and I will be going through those courses.
I was ,however , curious to know if anyone here in the community has encountered a similar use case or problem , especially using MongoDB, and if they did , how did they approach solving that.

Hi @Rounak_Joshi,

I am not familiar specifically with OME schema or its rules/limitations.

Perhaps you van highlight the main aspects?

I would say that we do have limitations in MongoDB like a document size cannot exceed 16MB

https://docs.mongodb.com/manual/reference/limits/

However, there are ways to.overcome those either by logically seperating documents or using a gridfs solution

https://docs.mongodb.com/manual/core/gridfs/

Thanks
Pavel