Check for missing field or null value in mongoDB atlas

I am using mongodb atlas for full text search. My sample collection looks like this :

{
  "_id": "62fdfd7518da050007f035c5",
  "expiryDate": "2022-08-18T23:59:59+05:30",
  "arrayField" : ['abc', 'def', 'ghi', 'jkl']
},
{
  "_id": "62fdfd7518da050007f035c6",
  "expiryDate": null,
  "arrayField" : ['abc','jkl']
},
{
  "_id": "62fdfd7518da050007f035c7",
  "arrayField" : []
},
{
  "_id": "62fdfd7518da050007f035c8",
  "expiryDate": null
}

expiryDate is a Date type field and arrayField is an Array type field.
My goal is to get all documents where either :

  1. expiryDate doesn’t exists OR
  2. If expiryDate does exists, then it must be null OR
  3. If expiryDate does exists, then it must greater than current time.

My current atlas aggregation looks like :

{
    'compound' : {
        'should' : [
            {
                'compound' : {
                    'mustNot' : [{
                        "exists": {
                            "path": "expiryDate",
                        }
                    }]
                }
            },
            {
                "range": {
                    "path": "expiryDate",
                    'gte': new Date()
                }
            }
        ],
        'minimumShouldMatch' : 1
    }
}

This is not returning all documents where the expiryDate field have null value and it is only matching one clause of should where expiryDate is greater than or equal to current time. I want it to return all those documents too where the expiryDate is null .

Though i know having a null field is bad db design but is there a solution to this problem ?

Hi @pawan_saxena1 - Welcome to the community.

Can you advise what your goal is here? Is it perhaps to view all documents with an non-existing or invalid "expiryDate"?

In saying so, maybe utilising $search may not be required here. You could create an index on the "expiryDate" field and then possibly use the following if it suits your use case:

db.collection.find({
  $or: [
    {expiryDate:{$exists:false}},
    {expiryDate:null},
    {expiryDate:{$gte:ISODate("2022-01-01")}} /// <--- Assuming the current time is ISODate("2022-01-01")
  ]
})
[
  {
    _id: '62fdfd7518da050007f035c6',
    expiryDate: null,
    arrayField: [ 'abc', 'jkl' ]
  },
  { _id: '62fdfd7518da050007f035c7', arrayField: [] },
  { _id: '62fdfd7518da050007f035c8', expiryDate: null },
  {
    _id: '62fdfd7518da050007f035c5',
    expiryDate: ISODate("2022-08-18T23:59:59.000Z"), /// <---- NOTE: My example has this as a date value
    arrayField: [ 'abc', 'def', 'ghi', 'jkl' ]
  }
]

Please note the first document in your sample data has the "expiryDate" as a string value rather than date value used in the example above.

If this isn’t what you’re after, please advise the following:

  1. Use case details.
  2. If the "expiryDate" data type is supposed to be string in your first sample document.
  3. MongoDB version in use.
  4. expected document output based off your sample documents.

Regards,
Jason

2 Likes

Hi @Jason_Tran I have already put the requirements in the question. Please go through the question again.
I will put requirements again :
My goal is to get all documents:

  1. use $search as i am using atlas and not mongodb
  2. expiryDate doesn’t exists OR
  3. If expiryDate does exists, then it must be null OR
  4. If expiryDate does exists, then it must greater than current time.
  1. use $search as i am using atlas and not mongodb

I understand those are your requirements. The alternate I have suggested runs in both in Atlas and non-Atlas deployments of MongoDB. To be more specific, i’m curious to what the 3 conditions equate to as a goal - I.e. Do those 3 requirements equate to documents that have an invalid expiry date from your application’s perspective?

In saying so, there a few ways that should be able to get the document(s) you’re after although i’m not sure if you’re wanting to use $search ONLY, whether additional stages are allowed, etc.

For example, if you’re only wanting to use the $search stage and nothing else, you may be able to get these documents using the below example:

DB> var a = 
{
  '$search': {
    index: 'default',
    compound: {
      should: [
        {
          range: {
            path: 'expiryDate',
            gte: ISODate("2022-01-01T00:00:00.000Z") /// <--- Assuming this is the current date for example purposes. Modify as required.
          }
        },
        {
          compound: {
            mustNot: [ { exists: { path: 'expiryDate' } } ]
          }
        },
        {
          compound: {
            must: [ { exists: { path: 'expiryDate' } } ],
            mustNot: [
              {
                range: {
                  path: 'expiryDate',
                  lt: 1 /// <---- null is less than 1 based off the below mongosh output
                }
              }
            ]
          }
        }
      ]
    }
  }
}

The oddity here is that i’ve used a range of less than 1 to try and retrieve null values on "expiryDate". However, i’d recommend also voting for the following feedback related to indexing null data types. The mongosh output for testing if null is greater than 1:

DB> null < 1
true

Similar to the above example I provided earlier, you can also obtain the same document(s) using a $match stage equivalent to my .find() example:

var b =
{
  '$match': {
    '$or': [
      { expiryDate: { '$exists': false } },
      { expiryDate: null },
      { expiryDate: { '$gte': ISODate("2022-01-01T00:00:00.000Z") } } /// <--- Assuming this is the current date for example purposes. Modify as required.
    ]
  }
}

Output:

DB> db.collection.aggregate(b)
[
  {
    _id: '62fdfd7518da050007f035c6',
    expiryDate: null,
    arrayField: [ 'abc', 'jkl' ]
  },
  { _id: '62fdfd7518da050007f035c7', arrayField: [] },
  { _id: '62fdfd7518da050007f035c8', expiryDate: null },
  {
    _id: '62fdfd7518da050007f035c5',
    expiryDate: ISODate("2022-08-18T23:59:59.000Z"),
    arrayField: [ 'abc', 'def', 'ghi', 'jkl' ]
  }
]

You could also do a combination of a $search stage that gets the path if it exists or not followed by a $match stage for documents that have an "expiryDate" field value that is greater than the specified date or equal to null. Although there would be no index usage here for the $match stage that follows.

Please note:

  • Examples above are only based off the 4 sample documents
  • Please test thoroughly in and verify it suits all use case / requirements.
  • "expiryDate" field in 4 sample documents tested do not contain any String value types.

Regards,
Jason

2 Likes

Hi @Jason_Tran .
Thank you for taking out the time.
I went through the docs of atlas and found that we should avoid using $match along with $search : * Query Performance , maybe in future the product gets evolved so much that it wouldn’t matter but for now the $match impacts the performance a lot.

I did a example benchmark on the collection, it have more than 70 million records and :

  1. Without $match the execution time : 51ms
  2. With $match alongwith $search : 893ms

Now that being tested, i am planning to stick with only $search and above resolution works : execution time : 90ms

The above resolution depicts that we used range operator (which can be used for numbers and date values).
I have one other use case where the types of fields are String and Array.

How can i achieve same thing for string and array types also ?

{
  "_id": "62fdfd7518da050007f035c5",
  "expiryDate": "2022-08-18T23:59:59+05:30",
  "arrayField" : ['abc', 'def', 'ghi', 'jkl']
},
{
  "_id": "62fdfd7518da050007f035c6",
  "expiryDate": null,
  "arrayField" : ['abc','jkl']
},
{
  "_id": "62fdfd7518da050007f035c7",
  "arrayField" : []
  "status" : ""
},
{
  "_id": "62fdfd7518da050007f035c8",
  "expiryDate": null,
  "status" : null
}

arrayField is an array of objectIds : [ObjectId(123…), ObjectId(456…)]

My Goal is :

  1. arrayField doesn’t exists OR
  2. If arrayField does exists, then it must be empty array or null OR
  3. If arrayField does exists, then it must be some specified value

My current query for arrayField is :

{
    "should" : [
        {
            "compound": {
                "mustNot": [{
                    'exists': {
                        'path': 'arrayField'
                    }
                }]
            }
        },
        { // <--- i assume it checks for empty array like []
            "compound": {
                "must": [{
                    'exists': {
                        'path': 'arrayField'
                    }
                }],
                "mustNot": [
                    {
                        "exists": {
                            "path": "arrayField.0"
                        }
                    }]
            }
        },
        {
            "equals": {
                "value": mongoose.Types.ObjectId(someExampleValue),
                "path": "arrayField"
            }
        }
    ],
        "minimumShouldMatch" : 1
}

Problem with above query is it checks for empty array , but not for null.
Same case for string type also.

1 Like

Thanks for getting back to me. You’ve stated the performance differences without $match and ($search with $match). However, I am curious as to why you are not using just a $match by itself (i.e. no $search) as shown in my previous example? This should be able to utilise indexes as well and would most likely be able work for you string and array types question.

Regards,
Jason

Hi @Jason_Tran i cannot use $match alone as it won’t uitlise the index mappings defined in atlas. The FTS mappings can only be leveraged using $search and $searchMeta :

i cannot use $match alone as it won’t uitlise the index mappings defined in atlas.

You could create an index on the expiryDate field for the example on this post. You can then do the same for your array and string type fields.

The documents can then be found using $match alone. E.g. (from previous reply of mine):

db.collection.aggregate({
  '$match': {
    '$or': [
      { expiryDate: { '$exists': false } },
      { expiryDate: null },
      { expiryDate: { '$gte': ISODate("2022-01-01T00:00:00.000Z") } } /// <--- Assuming this is the current date for example purposes. Modify as required.
    ]
  }
})

Have you compared the performance of a standard index and $match alone versus the Atlas search index w/ $search alone to see if it suits your use case? I’ve not yet tested this myself but if your concern is utilisation of indexes then you can create the index mentioned and run an db.collection.explain(“executionStats”) to see the results where there should be some level of index usage.

Regards,
Jason

1 Like

Yes i can do that , but the $match doesn’t support full text search like $search does. query is a small part of a very big query which also include the search on text , date, string, number, objectId types and uses custom analyzers, combinations of $must, $mustNot, $should, $filter etc. I can use compound in $search but not in $match.

I also agree that i can replicate the whole query (and i did it earlier), the query then spans to 4000 lines and very diffcult to maintain. using $search i can write much cleaner and self explanatory query (without having to deal with deep nesting of $or and $and).

But above all, I was curious that if the $search syntax is promoted so much in atlas docs, there must be a way which i might be missing :slight_smile:

I tried and failed so can you please check once more for the arrayField as i cannot use range or equals there.

1 Like

query is a small part of a very big query which also include the search on text , date, string, number, objectId types and uses custom analyzers, combinations of $must, $mustNot, $should, $filter etc. I can use compound in $search but not in $match.

Thanks for providing those details and clarifying the use case although unfortunately I’m not aware of how this can be done for array fields containing null value(s) off the top of my head or if it can be done at all.

I can also see you have raised another post in regards to this so I will also go ahead and close this post off since we have discussed a few possible solutions / workarounds specific to this post.

Regards,
Jason

2 Likes