Ensure that a collection two fields always have a 1:1 mapping

Collection c: 
Doc1:
    {
        field1 : "A",
        field2: "B"
        ....
    }

Is there a way to make sure that whenever field1=“A”, then field2 is always field2:“B”.
And vice-versa?

Hi @Gaurav_Khanna,

I haven’t tried myself, but I’d look in the direction of the $JsonSchema and the enum keyword.

You could change the doc design so it looks like:

{
  subDoc : {
    field1: "A",
    field2: "B"
  }
}

So you can add an enum constraint on the subDoc field and then list in the $JsonSchema all the allowed pairs.

Let me know if that works.

Cheers,
Maxime.

If the values were known in advance, then $JsonSchema would be a help.

The values are dynamic. So if there exists:
field1: “A” or any dynamic value
then
field2: “B” (always, but note that B is also a dynamic value)

Then how do you want to enforce this if you have no idea which values you are about to insert? You lost me. My supposition was that you had a list of predefined values to control this.

To be clear, my understanding was that the value of field1 was limited to 10 different values and field2 was also some other 10 values and you had to enforce that A1 => B1 or A2 => B2 etc.

But if you don’t have the mapping nor the exhaustive list of all the possibilities, I don’t understand how you could enforce a mapping.

1 Like

If field1=A already exists and maps to field2=B, then any document that is subsequently inserted and has a fiield1=A then the corresponding field2 should be equal to B (field2=B) and if not then that is an error condition.

I believe in SQL databases (not that I want to compare the two but to highlight how they do it) they can use a trigger (pre-insert).

Hopefully this is more helpful.

I have no idea how to do that without some back-end code then. I can imagine a bunch of different solutions using (or not) Realm Triggers or directly Change Streams to build a reference collection on the side and build a caching system so we don’t query this collection a bazillion times per minute.
But that’s definitely a bit tedious to build.
I can’t imagine a “pure” MongoDB solution to be honest.

Pre-trigger doesn’t exist. So we can’t do that.
You could send your write operation to a realm function using a webhook. The function would check the existence or not of these fields A & B and actually do the write operation for you or answer some error code because the rule has been violated. I think that would be the easiest solution to implement - but this would definitely have an impact on the insertion speed. And like I said, it’s using some back-end piece - so I’m “cheating” a bit…

I raised this to my team to see if someone has a better idea.

Cheers,
Maxime.

Hi @guarav_Khanna,

This is an absolutely fantastic question which has already caused me to flood my kitchen today thinking about the answer. In short the answer is “Yes” but that’s a spoiler alert as the method leads to questions which then get down to the underlying philosophy of document model databases. There is probably blog post material in this but I’m going to walk you through it.

Databases like to ensure good thoughput AND data correctness. Different database technologies (i.e. RDBMS and Document Models) apply some different techniques to achieve these. When I talk about performance we have to think about performance under load - throughput rather than latency of a single query which means minimising contention and locking. Making a DBMS for one user is easy, making one that is correct and scales with concurrent users is much, much harder.

You are looking to implement a constraint, an initial assumption would be that this is a server-side, write-time, enforced constraint. But MongoDB understands that server side write-time constraints (including transactions) have some very negative consequences which are one of the limiting factors of traditional RDBMS performance. Ultimately this boils down to contention between operations and it’s concrete manifestation locking, something on the server cannot go ahead because something else is happening and we need to wait for that to finish. If my server side operation needs a set of conditions to be true then I cannot allow those to vary during my operation and must keep them locked. If things can’t change then others cannot make changes.

Simply put if your constraints take a measurable period of time and apply to anything else in the database other than the record you are modifying then you risk limiting performance through locking and dealing with deadlocking. We already see this with multi document transactions, and this is how RDBMS work but it’s not the fast contention-free future document databases can offer.

In a document database you either need to keep the constraints restricted to the single document you are explicitly changing ( for example document validation) or to very simple constraints on explicit index keys you are already using (unique index constraint). Implementing anything else leads to server side contention, locking and potential performance issues and so we take a different approach.

MongoDB developed the set of minimal operations to allow the developer to implement complex logical operations on single documents whilst retaining correctness, it’s a clever compare-and-set variant. We can use these to manage our constraints - Simply doing pre-checks with find() alone does not work as you can get race conditions, you cannot simply query field A and check the value of B before inserting because what if two people do that at the same time - fortunately the updateOne and findOneAndUpdate functionality in MongoDB do allow these types of check - and a way to do it in a safe without causing
server contention or locking.

Hopefully you’re still reading, probably just looking for your answer but If I got that right you won’t need to come back and ask why this is the answer and we don’t just have before Insert triggers or more constraints.

Your constraint can be written as :

“When inserting Record R For any Value of R.a there does not exist any other document in the database With the same value of R.a and any value of R.b other than the one in this document.”

With MongoDB the trick is to explicitly verify the constraints before you perform your operation with a view to ensuring you don’t hold anything locked in the process.

To do this we can use updateOne ( and sometimes findOneAndUpdate if we need to know what already exists) with upsert and a unique constraint. as R.a is not unique on its own we need to use a second, very collection. This is because upserts itself does not lock things between it’s update and insert operations and so two upserts simultaneously need a unique constraint.

Here is an example.


//Check we don't have 'a' used with any other value of 'b'
// create one if we don't
// R.a is a unique index - so we should put it into _id\
// The $set is a no-op 
function ValidateABPair(ra, rb) {
    try {
        res = db.pairs.updateOne({
            _id: ra,
            b: {
                $ne: rb
            }
        }, {
            $set: {
                _id: ra
            },
            $setOnInsert: {
                b: rb
            }
        }, {
            upsert: true
        })

        if (res.matched == 0) {
            //We found nothing for a:ra so we can now insert or update on our main collection. We (probably) added a value for { _id: ra, b: rb }
            return true;

        } else if (res.matched == 1) { //We found a value for b that does not match our rb so do nothing
            return false;
        }

    } catch (ex) {
        printjson(ex)

        if (ex.code != 11000) throw ex;

        // We got an exception - if this is a unique constraint then either - there is a value that matches us and we tried to enter a new document with r.a OR someone beat us in the upsert and now there is a different value for R.b (or the same)
        //Either way we know there IS now a document with the a and b so we check ig
        if (db.pairs.findOne({
                _id: ra,
                b: rb
            })) {
            //Go ahead and do our write
            return true;
        } else {
            //Do not do our write
            return false;
        }
    }
}


R = { d: "Some data",
           a: "Scotland",
           b: "Whisky" }


if( ValidateABPair(R.a,R.b) ) {
	db.mydocs.insertOne(R)
}

Hope this helps - it seems complex but despite the additional calls to the server it’s likely more efficient overall - you can optmize it a little if an existing (a) is the more likely to be there by doing the check in the exception FIRST. This version assumes new values of a are more common than existing ones.

– John

7 Likes

Thanks for the detailed answer. That is very helpful. And I know of a plumber in case the kitchen issue needs resolving. :slight_smile:

2 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.