Basic Questions on MongoDB

|1|Scan rate. How fast is the search for certain size of data?
|2|Does frequent update on the array of objects same as updating the ordinary key value pair?|
|3|What is the ideal server specs for one node? (CPU, mem, harddisk)|
|4|How much data each node can handle?|
|5|How can we optimize our search query? How should we design our schema for fast search?|
|6|Which do you recommend, a row with sub properties or simple row with sub properties on another table?|
|7|How should we troubleshoot our cluster in case we encounter abnormality?|
|8|Is there any limit in row/collection/node size?|
|9|How should we handle a table with increasing data everyday? What is the right structure for that?|
|10|Is there any restriction or known problem with regards to nested array?|

Hello @Amy_Jonson, welcome to the MongoDB Community forum!

This is by no means anything but basic questions. Though the questions are very valid and are often asked on this forum and elsewhere. Each question needs to be a post by itself - and this is unusual (I think).

I have posted some relevant information for each of the questions and its not very detailed, and some links to documentation. Please feel free to search for each of the topics and you will find detailed answers on this forum and elsewhere on the net.

There are also other resources, in addition to the documentation links. If you navigate to the top of this page (press keyboard Home button), there is a horizontal menu with various links to blog posts, tutorials, podcasts, training videos, etc. One important resource is the MongoDB University, where you will find courses for topics of your interest like, Data Modeling and Performance.

In MongoDB, data is stored as fields and values within documents. A set of documents are stored together in a collection. And a grouping of these collections is a database. There can be many databases on a MongoDB deployment. As such there are no rows and tables in MongoDB vocabulary. MongoDB has databases, collections and documents (these are analogous to databases, tables and rows in SQL/tabular databases).


|1| Scan rate. How fast is the search for certain size of data?

Searching and the performance depends upon many factors - the size of data, the indexes on the search criteria fields, the hardware, and the query itself. A good response would be within 100 ms, I believe.


|2| Does frequent update on the array of objects same as updating the ordinary key value pair?

I think it is not the same, as they are different data and field types. There are different methods to work with different types of fields. The array field type has separate set of methods to insert, update, query, or delete array elements. The frequency of updates on different types of fields depends upon your application. As such an update operation within a document is atomic irrespective of the size and type of field.

See Update - Atomicity.


|3| What is the ideal server specs for one node? (CPU, mem, harddisk)

This is determined based upon your application requirements. How much data are you storing? What kind of queries you are performing? What kind of data is it? How much money you are willing to spend? There are many factors like this.

See References below.


|4| How much data each node can handle?

This is dependent upon your node’s specifications. How much hard disk, RAM, CPU, etc., the node has. In general, the data size is limited by the file systems of the servers. The database data is stored as files. By default, a collection creates at least two files; one for the data and another for the default _id field index. So, the number of files, size of these files and the resources to handle the files are main factors.

See References below.


|5| How can we optimize our search query? How should we design our schema for fast search?

There are query optimization techniques. There are tools like explain which generate query plans and you can study these to figure how the query is performing. In general, proper usage of indexes is an important aspect of the query optimization. Then, how the query is constructed. The server hardware plays a part too (for example, the configuration of RAM).

The designing of the schema (or data modeling) affects the performance. So, the questions to ask are what kind of data, what kind of application, how much data, what kind of operations, etc.

See Analyze Query Performance


|6| Which do you recommend, a row with sub properties or simple row with sub properties on another table?

This question is related to data modeling. You need to model the data based upon your application requirements. What are the relationships between the data entities (one-to-one, one-to-many, many-to-many)? What are the important operations you will be performing on this data? And, there are many other factors which influence the data design.

See Data Model Design - Embedded and Normalized


|7| How should we troubleshoot our cluster in case we encounter abnormality?

What is this abnormality you have encountered? Based upon the issue these issues can be resolved.


|8| Is there any limit in row/collection/node size?

MongoDB document have a limitation of 16 Megabyte (MB) maximum size. For larger data sizes per document (for example media files, etc.) there is a feature called as GridFS, which allows data storage larger than 16 MB size.

See GridFS and also References below.


|9| How should we handle a table with increasing data everyday? What is the right structure for that?

What data? How much data? What kind of operations? Without any details it is difficult to say anything. In general, there are design patterns which address various data and operational scenarios. These can be applied to some commonly occurring situations.

See Data Modeling Concepts topic Operational Factors and Data Models.


|10| Is there any restriction or known problem with regards to nested array?

In a MongoDB document, you can nest upto 100 levels. The size of the nested array(s) is limited by the document size of 16 MB. In such cases, you can model data with references.



References:

3 Likes

Hi Prasad_Saya,

Thank you for taking time to answer my questions.

Follow up questions:

Example we are inserting 1million document of 21kb every day. We do 6 updates on the nested array of each document and with frequent search. What would be the ideal server specs for that? Is it good to store all the documents in one single collection only or should consider creating new collection every day/week/month?

We are currently using elasticsearch as data storage but we found out that frequent update especially in nested array is costly that is why we are considering mongodb instead.

Hello @Amy_Jonson,

we are inserting 1million document of 21kb every day. Is it good to store all the documents in one single collection only or should consider creating new collection every day/week/month?

Some thoughts. So, you will have quite a bit of data in a year and more in two years, etc. More data means more disk storage. Accessing more data in your application means more RAM memory - the data and indexes used in queries (a.k.a working set) need to be in memory for performance; reading these from disk for querying will result in poor performance.

I don’t know your use case of the data and the queries. Storing documents in multiple collections based upon time is not a new concept. But, you will also have to look at your queries so that you shouldn’t be accessing two collections (all the time) for running your queries - for practical and performance reasons. This sure will require a closer study.


We do 6 updates on the nested array of each document and with frequent search.

MongoDB array fields can be indexed (these are called as Multikey Indexes), and this can be useful for read and update queries with array field as query filter criteria.


What would be the ideal server specs for that?

I would think in terms of working sets which will give an idea about the kind of RAM memory you might need. Then the data and index sizes for the disk drive needs. Please look for some finer details, especially about how MongoDB uses the computer’s memory (see FAQ: MongoDB Storage).

Hi @Prasad_Saya,

Thank you so much for your help. We will take a look on your suggestions.

Regards,