$ne and $regex does not use proper index

Hi everyone,

I am searching on more than 3 million records but fetching the data is very slow despite of using the indexes.

The problem here is $ne and $regex. While using other method like $eq, $in, $gt, $lt it is working fine and fetching the data very fast because it is using the correct index but while using the same query with $ne and $regex, it is taking more than 1 minute which is not feasible for my case as user wants the data within 3-4 seconds.

I have spent a lot of time in creating the proper index and by using the other methods it is also using the same index which I want but for $ne and $regex, it is behaving very different.

I am not able to figure out what to do to overcome this.

The problem with $ne and $regex is that it needs to scan a lot of data.

One scenario as an example :
user searches for email : {$ne:“abc@gmail.com”}, so in this case it must find the email not equal to “abc@gmail.com” so a lot of emails are there to scan so it is scanning so many docs which is taking more than 1 min so can you please suggest what to do?

Execution stats for such scenario :

"executionStats" : {
                    "executionSuccess" : true,
                    "nReturned" : 773918.0,
                    "executionTimeMillis" : 8440.0,
                    "totalKeysExamined" : 773927.0,
                    "totalDocsExamined" : 773927.0,
}
 "winningPlan" : {
                        "stage" : "PROJECTION_DEFAULT",
                        "transformBy" : {
                            "actor.user.email" : 1.0,
                            "actor.user.name" : 1.0,
                            "actor.user.uuid" : 1.0,
                            "time" : 1.0,
                            "_id" : 0.0
                        },
                        "inputStage" : {
                            "stage" : "FETCH",
                            "filter" : {
                                "$and" : [
                                    {
                                        "actor.user.email" : {
                                            "$not" : {
                                                "$eq" : "aaron.barnett@example.com"
                                            }
                                        }
                                    },
                                    {
                                        "customer.uuid" : {
                                            "$eq" : "039e5026-be55-4ec6-a0ac-5ec82be0313b"
                                        }
                                    },
                                    {
                                        "sequr_code" : {
                                            "$eq" : "SEQUR_ACCESS_GRANTED"
                                        }
                                    }
                                ]
                            },
                            "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                    "location.uuid" : 1.0,
                                    "time" : -1.0
                                },
                                "indexName" : "location_uuid",
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                    "location.uuid" : [

                                    ],
                                    "time" : [

                                    ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2.0,
                                "direction" : "forward",
                                "indexBounds" : {
                                    "location.uuid" : [
                                        "[\"9eb44f41-3788-4b5b-b8f1-5f12b3956f69\", \"9eb44f41-3788-4b5b-b8f1-5f12b3956f69\"]"
                                    ],
                                    "time" : [
                                        "[new Date(1666411200000), new Date(1666238400000)]"
                                    ]
                                }
                            }
                        }
                    },

Thanks.

It would be interesting to see the indexes that you have because while you get an IXSCAN, then index used has none of the fields specified in your query. It would be very useful to see the exact query that you run. Do you provide an index hint or something like this? Do you have a $sort stage on location.uuid and time or something that would force to use the index location_uuid?

Note that nReturned is 773_918 which is quite a big number of documents which means the query is not really selective.

I am not too sure about

because nReturned is pretty close to totalKeysExamined.

There is not much more to say without more information.

Your problem might also be a lack of resources. What is the total size of your data? What is the RAM size? Disk type and size? Are the client and servers on the same machines? If not how are they connected?

This is the aggregation query I am using:

 [
        {
            $match: {
               
                $and: [

                    {"actor.user.name":{$ne:"Adam Prescott"}}, 
                    { "actor.user.email": { $regex: "aaron.barnett@example.com", $options:'i' } },
                   { "card.card_number": { $eq: "12006461200" } },
                    { "actor.user.access_groups.name": { '$eq': "Admin" } },
                    { "location.uuid": { $in: ["9eb44f41-3788-4b5b-b8f1-5f12b3956f69"] } },
                    { 'customer.uuid': { $eq: "039e5026-be55-4ec6-a0ac-5ec82be0313b" } },
                     {"custom_attributes.key":{$regex:"licens"}},


                ],
                sequr_code: "SEQUR_ACCESS_GRANTED",

                time: {
                    $gte: new Date("2022-10-16T04:00:00.000Z"),
                    $lte: new Date("2022-10-22T04:00:00.000Z"),
                },
            },
        },
        {
            $group: {
                _id: {
                    user_uuid: "$actor.user.uuid",

                    date_string: {
                        $dateToString: {
                            format: "%Y-%m-%d",
                            date: {
                                $toDate: "$time",
                            },
                            timezone: "America/New_York",
                        },
                    },
                },
                exit_time: {
                    $max: "$time",
                },
                entry_time: {
                    $min: "$time",
                },
                records: {
                    $first: {
                        // 'time': '$time',
                        user_name: "$actor.user.name",
                        email: "$actor.user.email"
                    },
                },
            },
        },
        {
            $sort: {

                "_id.date_string": 1,
                // time:1,
                "records.user_name": 1,
                "_id.user_uuid": 1,
            },
        },
        { $limit: 10 },

    ]

I need to first match all the fields as per the user input and then actual searching starts.

This is the sample doc with dummy data:

{
  "_id": {
    "$oid": "6375c63bcaa9987618048048"
  },
  "scp_reply": {
    "nSCPNumber": 164,
    "type": 7,
    "ser_num": 46,
    "time": 1654061180,
    "source_type": 9,
    "source_number": 0,
    "tran_type": 22,
    "tran_code": 13,
    "format_number": 3,
    "cardholder_id": "1654061180",
    "floor_number": 0,
    "card_type_flags": 0,
    "elev_cab": 0
  },
  "cameras": [],
  "customer": {
    "uuid": "039e5026-be55-4ec6-a0ac-5ec82be0313b",
    "name": "Genea test support"
  },
  "location": {
    "uuid": "9eb44f41-3788-4b5b-b8f1-5f12b3956f69",
    "name": "GENEA test support India",
    "timezone": "Asia/Kolkata"
  },
  "time": {
    "$date": {
      "$numberLong": "1654061180000"
    }
  },
  "type": "ACCESS",
  "message": "Access Granted",
  "controller": {
    "uuid": "97aabb0c-7894-41df-a4ca-bfe3edf89c49",
    "name": "LP4502 450 (LPSERIES)",
    "mac": "00:0f:e5:0b:61:64",
    "model": "LP4502(LPSERIES)",
    "firmware_version": "1.29.6 (654)",
    "serial_number": "1004680",
    "connection": {
      "primary_host_connection": {
        "data_security_mode": "TLS Required",
        "encryption_status": "TLS Encrypted",
        "connection_type": "IP Client"
      }
    }
  },
  "actor": {
    "type": "USER",
    "user": {
      "uuid": "6fff0a80-8409-4008-9850-9ec8fb8431ee",
      "name": "Aditya Raval",
      "email": "aditya.raval@example.com",
      "avatar_file_name": "e48d7bbe-cca1-43ef-b48e-8030ec07e74b_1615202698.jpeg",
      "department": "Sales",
      "employee_number": null,
      "cost_center": "USA",
      "role": "ADMIN",
      "access_groups": [
        {
          "uuid": "91584e8c-4019-48c6-957b-0dbf45df3af1",
          "name": "Admin"
        },
        {
          "uuid": "3e23bd56-fcf7-40db-88a6-21bdc18996f3",
          "name": "Employee"
        },
        {
          "uuid": "fb83eed4-a84c-48c2-82fd-d71ff74a6780",
          "name": "Elevator Test"
        }
      ],
      "elevator_access_groups": [
        {
          "uuid": "6c4c0fc1-5577-48c9-bd6a-e83627e00cf5",
          "name": "All Floors"
        },
        {
          "uuid": "4c2ce295-d936-451b-b9d2-fb3396f76c9b",
          "name": "First Floor"
        },
        {
          "uuid": "2054fbbc-e0d6-477f-a5ca-78126cc5a5b5",
          "name": "Low Rises"
        }
      ],
      "is_card_access_revoked": false,
      "card_access_revoked_source": null
    }
  },
  "style": {
    "icon": "fa fa-circle-o green"
  },
  "sequr_code": "SEQUR_ACCESS_GRANTED",
  "note": null,
  "ip_address": null,
  "card": {
    "uuid": "fd192ab6-65b2-4c19-a041-6b9bad5fec12",
    "card_number": "12006461200",
    "type": "KEYCARD",
    "pin": null
  },
  "door": {
    "uuid": "42ac456e-df9b-4146-afb7-619dd584a524",
    "name": "Door Plaza Parking entry 1",
    "is_elevator_door": false,
    "elevator_door_type": null,
    "is_door_force_masked": true,
    "is_door_held_masked": true,
    "is_door_force_seen": false,
    "is_temperature_screening": false
  },
  "area": null,
  "interface_panel": {
    "uuid": "7056bcd0-140f-4381-93f6-2dd090c2c349",
    "name": "Internal SIO - Panel 27"
  },
  "control_point": [
    {
      "uuid": "8aedd858-cca9-4887-9d43-4953b933fa77",
      "name": "Gen Plaza CP"
    },
    {
      "uuid": "6c4c0fc1-5577-48c9-bd6a-e83627e00cf5",
      "name": "Plaza controller"
    }
  ],
  "monitor_point": [
    {
      "uuid": "8aedd858-cca9-4887-9d43-4953b933fa77",
      "name": "Gen Plaza MP1"
    },
    {
      "uuid": "6c4c0fc1-5577-48c9-bd6a-e83627e00cf5",
      "name": "Plaza monitor point"
    }
  ],
  "schedule": null,
  "card_format": {
    "uuid": "0306fca9-3757-4632-95ed-0b7be3e93407",
    "customer_uuid": "2abbb9a1-d102-4608-b798-e1c6128528e7",
    "is_deleted": false,
    "name": "37 bits",
    "description": "",
    "category": "PHYSICAL",
    },
    "updated_at": {
      "$date": {
        "$numberLong": "1668662843275"
      }
    },
    "created_by_user_uuid": null,
    "updated_by_user_uuid": null,
    "is_200_bit_fascn_to_128_bit_version_conversation": false,
    "is_card_id_check_with_other_formats": false
  },
  "additional_info": {
    "Card Number": "12006461200",
    "Format Number": 3,
    "Format Name": "37 bits",
    "description": "Access Granted - Full Test, Door Used"
  },
  "custom_attributes": [
    {
      "key": "testcheckbox",
      "value_text": [
        "TB"
      ]
    },
    {
      "key": "sample_date_picker",
      "value_date": "2022-05-08T05:50:24.000Z"
    },
    {
      "key": "license_plate",
      "value_text": "license may 8518"
    },
    {
      "key": "testaditya",
      "value_text": "TB"
    }
  ],
  "tenant": null,
  "elevator": null,
  "floor": null,
  "show_on_ui": true,
  "is_camera_centric": true,
  "created_at": {
    "$date": {
      "$numberLong": "1668662843275"
    }
  }
}

Here is the most used indexes:

1.    
       {
        "v" : 2.0,
        "key" : {
            "location.uuid" : 1.0,
            "customer.uuid" : 1.0,
            "sequr_code" : 1.0,
            "time" : 1.0,
            "actor.user.name" : 1.0,
            "actor.user.email" : 1.0,
            "actor.user.department" : 1.0,
            "actor.user.employee_number" : 1.0,
            "actor.user.cost_center" : 1.0,
            "custom_attributes.key" : 1.0,
            "custom_attributes.value_text" : 1.0,
            "custom_attributes.value_date" : 1.0,
            "card.card_number" : 1.0,
            "card.pin" : 1.0
        },
        "name" : "custom_default"
    },
  
2.   
    {
        "v" : 2.0,
        "key" : {
            "location.uuid" : 1.0,
            "customer.uuid" : 1.0,
            "sequr_code" : 1.0,
            "time" : 1.0,
            "actor.user.name" : 1.0,
            "actor.user.email" : 1.0,
            "actor.user.department" : 1.0,
            "actor.user.employee_number" : 1.0,
            "actor.user.cost_center" : 1.0,
            "actor.user.access_groups.name" : 1.0,
            "card.card_number" : 1.0,
            "card.pin" : 1.0
        },
        "name" : "filterable_fields"
    },

3.
 {
        "v" : 2.0,
        "key" : {
            "location.uuid" : 1.0,
            "time" : -1.0
        },
        "name" : "location_uuid",
        "background" : true
    },

Other indexes are also there but this is the index which is being used 90% of the time with the above query. I need to find only ten records no more than 10.

Current data size is 3 million which is not that high I guess. If configuration is the problem then for selective fields index it should not work very fast but with this it is giving result in milliseconds but I am stuck at a situation like this:

If user made search for username and he has mispelled the username i.e. mispelled name is not there in an entire Database then it is taking so much time and same with the regex.

Not necessary. A selective index would require much less work since documents that are not selected do not need to be read from storage while a non-selective index would need to fetch much more documents before return the result.

A $regex that is not anchored at the beginning is a lot slower that one. Rather than using $regex and the ignore case option for your email matches, I strongly recommend that you normalize your data by converting to all lower-case. You would then avoid needing to use regex.

Sorry for the late reply.
But the problem which I am facing is if user is giving random input let’s say email is abcdef@gmail.com and user is searching for email contains bcd but he is giving input like this BCd so for this we need to apply case insensitive input and also we do not know whether the provided input word starting alphabet is same as our actual data, means user can provide anything like this “sfdgssg”, so on this scenario fetching the data is taking a lot of time.

Also it is not specifically for email there are some fields which contains Capital and small letter together so for those fields it will be a problem.