Schema Validation : Good Practice or Not?

Vikram_Bade · September 2, 2021, 8:06am

I have a general question, Is it a good practice to have Schema Validation as part of your Collection Design or it is suggested to avoid? Keeping in mind that Mongo DB is flexible with regards to its design, is it a good practice to design your schema and have strict validation rules to some extent or it is going to be counter productive? Should we leave it to the application to deal with validating? Any suggestions/advise?

For example:

db.createCollection("CustomersTest", {
   validator: {
      $jsonSchema: {
		   "title": "Main Table for Customers",
			"description": "This document records the details of an customer and demographics",
		 bsonType: "object",
         required: [ "CustomerID","FirstName", "LastName" ,"address"],
         properties: {
            CustomerID: {
               bsonType: "int",
               description: "must be a Integer and is required"
            },
			FirstName: {
               bsonType: "string",
               description: "must be a string and is required"
            },
			LastName: {
               bsonType: "string",
               description: "must be a string and is required"
            },
			MiddleName: {
               bsonType: "string",
               description: "must be a string and is required"
            },
			address: {
			   bsonType: "array",
			   required: [ "addresstype","address1", "city", "state", "zipcode" ],
			   properties: {
				   "addresstype": { bsonType: "string" },
				   "address1": { bsonType: "string" },
				   "address2": { bsonType: "string" },
				   "address3": { bsonType: "string" },
				   "city": { bsonType: "string" },
				   "state": { bsonType: "string" },
				   "zipcode": { bsonType: "string" },
			   }
			}
		}
   }}})

Prasad_Saya · September 2, 2021, 9:20am

Hello @Vikram_Bade,

Schema Validation is an optional feature of MongoDB.

You can develop applications without this validation too. Eventually, applications consume data and data validation is inevitable. It can be right for the data input from the user interface or imported from another application. In a typical application data validation happens at various levels - at the client and also on the backend at the application as well as at the database level. This schema validation is at the database level.

Applications are written using a programming language - using drivers for Java, NodeJS, Python, etc. Some of these languages have some type mapping libraries or tools like Mongoose ODM for NodeJS or POJO mapping for Java. These mapping tools also enforce some validation of the data being read or written from and to the database. So, validation is not a special feature - its part of an application. You can also write applications without these mapping also, for example, using a MongoDB NodeJS driver.

It is matter of design and implementation choice. It can affect your application lifecycle and it needs to be a conscious decision.

Vikram_Bade · September 2, 2021, 9:47am

Thank you @Prasad_Saya . Absolutely agree with you that this is a design decision as part of your application development.

When it comes to Mongo DB, does this add a overhead (performance) to do it at database level and if so, is that significant enough to deter us to put database level validation or its small enough to use it based on the application design?

Prasad_Saya · September 2, 2021, 9:53am

@Vikram_Bade, the data validation is a requirement of the application. It should not be considered as an ‘overhead’ - its just part of the process of designing and developing an application (and database is a part of it). The application (or database) performance (or overhead) is not part a design process - it is secondary or a requirement that should not be considered, generally.

Vikram_Bade · September 2, 2021, 10:36am

Thanks @Prasad_Saya . Understood and agreed.

Stennie_X · September 27, 2021, 5:52am

Hi @Vikram_Bade,

In general you should validate data as early as possible (i.e. client-side validation in your UI before sending to the application, and in application logic before sending to the database) to improve the user experience.

Sending a request to your application server (and in turn, from the app to the database) adds some UX latency because validation errors need to propagate back through the layers and be presented to the user to resolve.

However, you should also not assume that earlier validation was successful. For example: client-side validation might be bypassed by disabling JavaScript or posting directly to an applicant endpoint.

Validating at the database layer is useful to ensure your data matches the expected schema when data is inserted or updated. As @Prasad_Saya noted, validation would be used to meet an application design requirement for the correctness of data so it is a decision based on necessity rather than overhead. If you don’t validate data, the consequences will be handling possible inconsistencies in your application code or via later data cleansing.

Regards,
Stennie

Vikram_Bade · October 3, 2021, 3:28pm

Thanks @Stennie_X for your inputs. Agreed and makes sense.