Validate() behavior not consistent with documetnation

Problem:

The mongo documentation explains schema validation and the ability to validate existing documents in two places:

To perform validation checks on existing documents, use the validate command or the db.collection.validate() shell helper.

db.collection.validate() also validates any documents that violate the collection’s schema validation rules.

The output of the validate function would then theoretically report the number of all documents that are invalid based on the current validator settings on the collection. This, however, is not what is observed. Mongodb validate() will return that all Documents are valid after adding a validator to a collection with existing documents that are not valid.

In v5 and greater it does report warnings and says to check logs, but in <v4 no information is reported at all.

Reproduce:

  1. Start a mongo instance
docker run \
    --name mongo5_0_validator_test \
    -d \
    --env=MONGO_INITDB_ROOT_USERNAME=admin \
    --env=MONGO_INITDB_ROOT_PASSWORD=password \
    mongo:5.0
  1. Connect to instance via CLI
docker exec -it mongo5_0_validator_test mongo --username admin --password password
  1. Create test data set, validator, and run validate
use test
db.col1.insert({ name: "joe" });
db.col1.insert({ namE: "bre" });
db.col1.insert({ test: "poe" });
db.runCommand({
    collMod: "col1",
    validator: {
        $jsonSchema: {
            bsonType: "object",
            properties: {
              "_id": { bsonType: "objectId" },
              name: { bsonType: "string", description: "test" }
            },
            additionalProperties: false
        }
    },
    validationLevel: "strict",
    validationAction: "error"
});
db.col1.insert({ name: "jack" });
db.col1.insert({ namE: "this throws an error because 'namE' is not a defined property, we still only have 4 documents now, 2 existing are invalid to the schema pre validator addition." });
db.col1.validate()
  1. Observe validate output

mongo v5 output

In v5 of mongo we at least get a warning that tells us some documents are actually invalid, but the nInvalidDocuments is still 0

{
    "ns" : "test.col1",
    "nInvalidDocuments" : 0,
    "nrecords" : 4,
    "nIndexes" : 1,
    "keysPerIndex" : {
        "_id_" : 4
    },
    "indexDetails" : {
        "_id_" : {
            "valid" : true
        }
    },
    "valid" : true,
    "repaired" : false,
    "warnings" : [
        "Detected one or more documents not compliant with the collection's schema. See logs."
    ],
    "errors" : [ ],
    "extraIndexEntries" : [ ],
    "missingIndexEntries" : [ ],
    "corruptRecords" : [ ],
    "ok" : 1
}

mongo v4 output

You’ll notice in v4 there isn’t even any hint that anything may be wrong with the documents.

{
    "ns" : "test.col1",
    "nInvalidDocuments" : 0,
    "nrecords" : 4,
    "nIndexes" : 1,
    "keysPerIndex" : {
        "_id_" : 4
    },
    "indexDetails" : {
        "_id_" : {
            "valid" : true
        }
    },
    "valid" : true,
    "warnings" : [ ],
    "errors" : [ ],
    "extraIndexEntries" : [ ],
    "missingIndexEntries" : [ ],
    "ok" : 1
}

Expected Behavior

I expect, based on the mongo documentation, that the nInvalidDocments would report the # of Documents that have failed. It does not appear that there is any good way to identify the invalid documents without looking at database logs, which is not very useful.

The Question to Answer

How can you determine all existing invalid Documents in a collection that has had validator added/updated?

Is there any way to iterate over the collection and validate each document in lue of this behavior, especially if this behavior is actually expected?

Possible solutions

The only idea we had in the discord conversation was creating an entire new collection with the new validation on it and bulk inserting the old collection into it to see what fails. This is obviously not an ideal scenario.

It is also not great or appropriate for application logic to do the validation since there is no straightforward way to go from mongo’s custom bson jsonSchema to whatever language/driver you’re using.

I also did not find anything on validating a single Document, only the full collection validate method exists which would rule out some kind of pipeline/match to figure out all invalid documents.

Previous Conversations:

(I can’t put valid links here because forums block me, I figure linking the docs above were better use of my 2 possible links…)

  • I originally asked in the community discord, some discussion and confirmation from another member experiencing the same behavior: https (colon) // discord (dot) com/channels/714857985389625415/714857985943535659/979451499409190912
  • An old forum post looked like it was similar issues, asking what is expected out of validate call: https (colon) // www.mongodb (dot) com/community/forums/t/performing-document-validation-from-a-driver-and-db-collection-validate-return-issues/128228

Welcome to the MongoDB Community @joshua_bell !

The change in validation output and db.collection.validate() behaviour is due to Improved Error Messages for Schema Validation in MongoDB 5.0.

You haven’t mentioned which release of MongoDB 4.x, but the major release series are 4.0 (now end of life), 4.2, and 4.4. New features and compatibility changes are only introduced in new major releases, so referring to x.y is more meaningful than vX in terms of common behaviour.

The major versioning scheme changed as of MongoDB 5 (Accelerating Delivery with a New Quarterly Release Cycle, Starting with MongoDB 5.0) so major production releases are now annual (5.0, 6.0, …). There are also quarterly rapid releases (5.1, 5.2, 5.3) which are development previews leading up to the next major release (X.0).

You can use the $jsonSchema query operator to find existing documents that do (or do not) satisfy the criteria for a validator. The $jsonSchema operator requires a JSON schema definition as a parameter, so you can include the current collection validator or a custom one.

Regards,
Stennie

Thanks for the info. Those runs where with v4.4 and v5.0 (Last two docker images released on docker hub)

I didn’t realize you can use the $jsonSchema as a query operator so that is good to know. Sounds like that will be the only way to detect existing invalid records in a collection.

Can you explain what "nInvalidDocuments" actually means though? Based in the current docs the $jsonSchema validator should be run when calling collection.validate. It clearly runs at some point in v5 because it prints the warning, but I would expect the invalid document count to include those because it is set to strict error and not warn. Is the behavior of validate more well documented? The description for nInvalidDocuments does not talk about edge cases that go against the quoted docs at the top of my original post. Those quotes seem misleading if the validate function does not, in fact, validate existing documents.

Hi @joshua_bell,

db.collection.validate() is a wrapper around the validate command. The current documentation definitely needs some improvements:

  • Prior to MongoDB 5.0, validate was focused on the structural integrity of indexes and documents so the nInvalidDocuments is reporting documents that cannot be read by the underlying storage engine (for example, document corruption with mismatched document vs data size).

  • In MongoDB 5.0+, validate checks for non-compliant documents and should include more details in the logs about non-compliant documents. Documents that are not compliant with the schema are not counted as invalid documents.

  • In the upcoming MongoDB 6.0 release validate has a new nCompliantDocuments counter.

I added an improvement suggestion for the documentation: DOCS-15364: Add more detail on what validate considers “invalid documents”.

I hope that clarifies the expected outcomes. Please also feel free to comment/upvote/create DOCS tickets in the MongoDB Jira issue tracker (or provide feedback to the team directly via a documentation page). Discussion and feedback in the forums is also an option :slight_smile:

Regards,
Stennie

Thank you for the explanation and getting a ticket made for the docs, that makes a lot of sense. That clarifies everything I need to know. Looking forward to a v6 with the compliant documents counter.

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.