(newbie) Collection configuration for storing very different data

Trisha_Pellis · April 3, 2022, 12:25am

Hi,

I’ve worked a lot with SQL and now I’m trying to learn how to use MongoDB.

I decided to use the game Stardew Valley for this, because that’s information I have. Now the first stumbling block I run into is how to store my data in collections to be able to look up the things I want. I have different types of data:

trees & crops: season, buy price, fruit sell price
fish: season, weather, location, hours
recipes: ingredients (will come from the previous two), sell price, buffs
machines: inputs (will come from the first two), outputs, sell price
NPCs: mainly which NPC likes which item out of all of the previous

Just to show this is about very different data. In SQL, each of these would be a separate table, there’d be extra reference tables (like seasons), the NPC liked items would probably end up being a union of data grabbed from all the previous.

In MongoDB, I’ve read collections being referred to as Mongo’s equivalent to tables, but I also get the impression it is not generally the intention to try and combine them, or grab information from multiple ones. It’s possible but seems like extra hassle and kinda not meant to be done (I may have found wrong answers, I don’t know). So say that, for example, I wanted to look up all of the resources I would need to fulfill the wants of one NPC – that means, 1) apples (method:raw), 2) bass (fish) and wheat to make the recipe of breaded fish (method: kitchen), 3) strawberry and milk to make strawberry ice cream (method: ice cream machine) etc. In other words, I basically want to be able to grab data from every last one of these categories and combine them, and that is the most frequent thing I want to do.

Do I put all of these different categories in the same collection?
Do I put them in different collections, in which case, a pointer to useful documentation of how to get the results I want would be very appreciated.
Is this too complicated for MongoDB alone/ Would I need to figure this out in my actual application somehow? I am told MongoDB can do everything SQL can and is basically much better at it, so I would expect this to be possible…?

The idea behind all this is basically, if you learn how to do the most difficult thing, then everything simpler will be a piece of cake later. Please keep in mind, if possible, that right now I’m doing this with a silly game with very limited data but I would like to know the best or most recommended ways of thinking about this so I can apply this to important stuff later.

Thank you very much for your input.

Jack_Woehr · April 4, 2022, 4:41am

Hello, @Trisha_Pellis … IMHO (not a MongoDB employee!) you got the message right.
Mongo is about stuff you can represent in a document.
My wife is a potter. I wrote her a website showcasing her work in MySQL. It’s relational as can be.
Now I’m rewriting it in MongoDB. Because really, a form for a piece of pottery is more document-like than relational.
It has fields in it like:

name
price
image location in the file system
a dictionary of possibly multiple attributes (glazed, horsehair …)
a dictionary of possibly multiple categories (functional, dishwasher-safe …)

So a pottery entity is like a one-page prospectus more than a relation.
You certainly can do joins and relational stuff in MongoDB.
But to me, if you want relational calculus, use a relational DB (ducking so MongoDB loyalists don’t throw stuff at me).
If your data is primarily document-like, an RDBMS is overkill and a nuisance. Especially if what you are really trying to do is feed a website. MongoDB is very nice for that.
In your case, a game character is very much like a MongoDB document. Individual attributes (name, origin), lists of multiple attributes … Works nicely in MongoDB.

Trisha_Pellis · April 4, 2022, 8:57pm

Thanks for your answer… but I’m not sure if you’re telling me to do my thing in MongoDB or to use something else
I started with a project in MongoDB purely for the sake of learning how Mongo works with a subject I like and for which I don’t have to go looking for the information, or use example data I don’t care about.
I’m not sure how one would determine if a certain type or collection of data is document-like. In the end, I have no trouble figuring out how to represent the data I have as documents. And yes, I would like to reach a point of displaying this information on a website. My question is purely about whether it would be more intelligent to store all those documents in one single collection, or if it would be smarter to store them in several collections (like I would in SQL), and if the latter option would make it more difficult to get the data back out in the way I want.

I very much understand that you say a pottery website would be much simpler and not require relations or anything like that, I’ve made product pages (your wife’s stuff is really pretty! I like pit-fired pottery a lot). But this, in turn, makes me wonder, from the perspective of a person who is hoping that knowing Mongo may help me get a job at some point: is that all Mongo is used for? (I understand if you don’t know, but in case you do). A lot of people seem to be touting non-relational databases as the way of the future, and relational ones as stone age stuff. But you’re telling me that it’s more a matter of what you want to use it for? That for the sake of the example I’m trying to use, relational is (in your opinion) better? Because in that case, maybe i should go for an example dataset after all. Or at least be less ambitious with my plans.

Jack_Woehr · April 5, 2022, 2:08am

@Trisha_Pellis I encourage you to proceed with MongoDB. It’s being used for all kinds of things.

I think any programming metaphor has appropriate and less appropriate domains in which it can be exercised.

I cannot claim to have a definitive understanding of what MongoDB is good for and what it may not be as good for.

I was just giving you one person’s experience, that I found MongoDB especially good for what I am doing, specifically, to create a website that presents items that are easily represented in a document.

Probably the only way for you to answer your questions is to carry out your project and come up with your own impressions.

Jack_Woehr · April 5, 2022, 2:10am

Actually, given the example you cited, modelling a game, I felt MongoDB was very appropriate. Excuse me if I was too verbose in my earlier posting and insufficiently clear.

steevej · April 5, 2022, 12:58pm

I concur with

I have done my fair share of SQL and I prefer MongoDB. I like it because it is schema less and much closer to OOP. It is very easy to change the shape of your document, much less so with SQL.

As for multiple collection or single one, think about the things (the object in OOP terms) you are handling. For example, if you have people and products, then you would probably want them to be in different collections.

The course M320 from MongoDB university is really good to make these kind of decision. The M100 course might also be of interest since you

Also take a look at

Jack_Woehr · April 5, 2022, 1:50pm

@steevej , I think the MongoDB community “mantra” that the system is “schema-less” is somewhat misleading.

I have found in my own (limited) experience that creating validators can be an important safety measure. (@Trisha_Pellis, MongoDB validators are effectively schema declarations).

steevej · April 5, 2022, 2:12pm

I would not say misleading, but I cannot find another word to better say it.

One thing is sure is you have no obligation to use validators and I don’t. With SQL you have not choice but to go and do all your CREATE TABLE first and then your ALTER TABLE because of course your first iteration was not adequate.

With MongoDB and without the optional validators, you may skip that step and your model is very flexible.

The above is absolutely true, but optional.

Jack_Woehr · April 5, 2022, 2:27pm

That is very perceptive of you, @steevej , I struggled how to say that in a way that did not seem pejorative. I tried “fallacy”, “shibboleth”, “myth”, etc., but none sounded right.

The above is absolutely true, but optional.

I have experienced this in other programming regimens, e.g., the Forth programming language.

We were very proud how free-form it was.

In the end, its freedom became a barrier to acceptance, because one could not count on standard practices being present in any code body. Maintenance was very difficult if the maintainer was not the original author.

This is probably not such a barrier for MongoDB since the system possesses such a full set of “optional” features that one would be wise to employ in any enterprise application.

Stennie_X · April 7, 2022, 12:23pm

Welcome to the MongoDB Community @Trisha_Pellis !

I strongly second @steevej’s recommendation for Building with Patterns: A Summary | MongoDB Blog as well as MongoDB University courses and would add https://www.mongodb.com/developer/article/schema-design-anti-pattern-summary/. The Attribute Pattern and Polymorphic Pattern would both be relevant reading for your use case.

I agree that schema-less is misleading and recommend using the phrase “flexible schema” instead. Data always has some shape or schema, even if not explicitly declared or enforced.

With traditional RDBMS, data models are designed based on how you plan to store the data and schema is generally rigid (all records have the same fields) and starts off highly normalised. This is a generalisation as some databases do support variant column types or arrays, and denormalisation is common practice for scaling & reporting. However, I’ve worked with a lot of SQL-based RDBMS systems (technically for much longer than MongoDB!) and one of my frustrations is the lack of consistency across SQL variations and the awkwardness of extended syntax once you get outside of the core relational model.

RDBMS are trying to evolve to suit modern data requirements with more flexibility and features like online DDL and JSON support, but are still informed by starting with design assumptions around rigidity and central catalogs. For an example of some developer challenges, I think this blog post by Buzz Moschetti is a great example that reflects some of my own experience: Postgres JSON, Developer Productivity, and The MongoDB Advantage.

With MongoDB, performant data models are designed based on how your application commonly uses data and can be refined as requirements change or are better understood. Schema is flexible (documents may have varying shapes) and appropriately denormalised (duplicating some data to optimise for common use cases). Developers can also avoid the overhead of translating between how they work with data via application objects or ORMs and how the data is actually stored.

MongoDB’s schema validation approach is a good example of flexible versus rigid schema as well as designing for distributed deployments with minimal central dependencies.

With a fixed schema catalog (RDBMS):

There is a central schema catalog or data dictionary (typically INFORMATION_SCHEMA) which includes the schema definitions for all field types.
Schema alterations like adding or removing columns must be coordinated by modifying the central schema catalog and typically involve migrating all rows in the table to the expected schema. Migrations between schema versions can lead to scalability challenges when there are millions or billions of rows to update and schema changes are a blocking operation for concurrent reads & writes.
Some RDBMS products support some non-blocking schema maintenance (aka “Online DDL operations”), but these require opting-in via additional syntax as well as understanding usage and performance caveats associated with the underlying rigid schema structure. For example, see MySQL (InnoDB storage engine) Online DDL Syntax and Usage Notes and InnoDB Online DDL Limitations.

With MongoDB’s flexible schema:

Schema validation is optional and only applied when documents are inserted or updated. As a developer, you have more control over when or if older documents are migrated to a newer schema version if your use case expects certain document shapes.
BSON documents embed all of the information (field names and types) required to interpret a document – this is also known as “schema-on-read”. In a sharded deployment, documents can be inserted or migrated to different shards (data partitions) with minimal centralised dependencies beyond routing based on the destination collection and shard key.
Documents can have more complex shapes that directly map to application objects including embedded documents, arrays, and variant field types.
Hot tip: there is also a $jsonSchema query operator that can be used to find existing documents that match (or do not match) a schema validator.

Regards,
Stennie

steevej · April 7, 2022, 12:33pm

I will definitively use flexible schema rather than schema-less in the future.

Thanks for the insight.