Using Indices in dynamic queries

Andreas_Kroll · July 12, 2023, 10:04am

Hi all!

I have a collection which has rather large documents (265.000 objects occupy 78GB space).
Users are able to query documents by creating any combination and order of search fields.
Creating an index takes a long time, because server has to move 78GB objects through its memory.

To not restrict the user to be able to only use certain combinations of search fields, I have come up with the following concept:

Create an object metaInfos as array in each object.
Create objects in { name: “”, value: “” } form for each field that can occur in a search.
Create a SINGLE index on metaInfos.name + metaInfos.value.

Query any combination of search fields like this :

{
  "$and" : [
    {
      "metaInfos" : {
        "$elemMatch" : {
          name : 'nameOfField1',
          value : 'valueOfField1'
        }
      }
    },
    {
      "metaInfos" : {
        "$elemMatch" : {
          name : 'nameOfField2',
          value : 'valueOfField2'
        }
      }
    }
  ]
}

Explain shows me this uses the index for each of the queries I tested.

Now I also want to use aggregation on it.
E.g. I have a state field containing the current state of the document in string format.

I would like to aggregate by group like this:

db.getCollection("test").aggregate(
  [
    {
      "$group" : {
        "_id" : "$state",
        "count" : {
          "$sum" : NumberInt(1)
        }
      }
    }, 
  ], 
  {
    "allowDiskUse" : false
  });

But since the state is only available inside the array I have to

db.getCollection("test").aggregate(
  [
    {
      "$unwind" : {
        "path" : "$metaInfos"
      }
    }, 
    {
      "$match" : {
        "metaInfos.name" : "state"
      }
    }, 
    {
      "$group" : {
        "_id" : "$metaInfos.value",
        "count" : {
          "$sum" : NumberInt(1)
        }
      }
    }
  ], 
  {
    "allowDiskUse" : false
  }
);

Explain however shows this to NOT use the index, but rather make a collection scan.
Is there any other way to use aggregation on a specific item in an array that is not location based like .0 ?

Thanks for your help
Andreas

Kushagra_Kesav · July 14, 2023, 7:57am

Hi @Andreas_Kroll,

Welcome to the MongoDB Community!

Based on the shared information, I guess you can use the $arrayElemAt operator in your aggregation pipeline. This will allow you to access an element at a specific index within an array.

To assist you more effectively, could you please provide the following details: the index you have created on your collection, a sample document, and the version of MongoDB you are using? Additionally, let us know if you are using MongoDB on-prem or MongoDB Atlas.

Look forward to hearing from you.

Regards,
Kushagra

John_Sewell · July 14, 2023, 11:32am

Once you unwind an index will not be hit, can you not just have a multi-key index on metaInfos.value and then have that as the first stage, you can then unwind and match again to filter out just what you want but that initial filter will hit the index and hopefully reduce the dataset before the aggregation has to filter on un-indexed data.

Do you have some sample data if I’ve misunderstood this?

I was thinking of something like this:

db.getCollection("Test").explain().aggregate([
{
    $match:{
        "metaInfos.value":{
            $in:["Thing D", "Thing F"]
        }
    }                
},
{
    $unwind:'$metaInfos'
},
{
    $match:{
        $or:[
            {$and:[{'metaInfos.value':'Thing D'},{'metaInfos.name':'Field 3'}]},
            {$and:[{'metaInfos.value':'Thing F'},{'metaInfos.name':'Field 3'}]}
        ]
    }
},
{
    $group:{
       "_id" : "$metaInfos.value",
        "count" : {"$sum" : 1}        
    }
}
])

So, show how many records matched “Thing D” or “Thing F” in the metadata within the “Field 3” metadata tag. Assuming that each document only has a metadata defined once within it…

The explain for the above shows an IDXSCAN:

                        "inputStage" : {
                            "stage" : "FETCH",
                            "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                    "metaInfos.value" : 1.0
                                },
                                "indexName" : "metaInfos.value_1",
                                "isMultiKey" : true,
                                "multiKeyPaths" : {
                                    "metaInfos.value" : [
                                        "metaInfos"
                                    ]
                                },

/Edit - this was the data I used:

{
    name:'Thing 1',
    metaInfos:[
        {
            value:'Thing A',
            name:'Field 1'
        },
        {
            value:'Thing B',
            name:'Field 2'
        },
    ]
},
{
    name:'Thing 2',
    metaInfos:[
        {
            value:'Thing C',
            name:'Field 1'
        },
        {
            value:'Thing D',
            name:'Field 3'
        },
    ]
},

Andreas_Kroll · July 17, 2023, 9:34am

Hi Kushagra,

I cannot use $arrayElemAt as the tuples { name: “xxx”, value: “yyy” } are not in the same order and each document can have a different set of { name: “xxx”, value: “yyy” } tuples associated.

Is there any function that could seach for contents of one property and deliver back the contents of another property?

Like $arrayElemFind called with name: “xxx” and delivers contents of value back?

Kushagra_Kesav · July 17, 2023, 9:51am

Hi @Andreas_Kroll,

It’s a little difficult to assist without the sample document structure. However, I think the $arrayToObject aggregation pipeline operator might be beneficial in this scenario.

Regards,
Kushagra

Andreas_Kroll · July 17, 2023, 12:14pm

Hi!

is there any way I can transform the array with { name: “”, value: “”} pairs into an object like

{
  "name1": "value1", 
  "name2": "value2"
}

I tried using $arrayToObject, but this demands the properties to be named k and v and does not work otherwise.

Any hints?

Kushagra_Kesav · July 17, 2023, 12:36pm

Hi @Andreas_Kroll,

Considering the following sample document structure:

{
    _id: ObjectId("64b531930b74e330f809d559"),
    arrayField: [
      { name: 'name1', value: 'value1' },
      { name: 'name2', value: 'value2' }
    ]
}

I have tested the following aggregation pipeline using $replaceRoot and $arrayToObject operator, which may give you the required results.

db.test.aggregate([{
  $replaceRoot: {
    newRoot: {
      $arrayToObject: {
        $map: {
          input: "$arrayField",
          in: {
            k: "$$this.name",
            v: "$$this.value"
          }
        }
      }
    }
  }
}
])

And it is returning the following output:

{ name1: 'value1', name2: 'value2' }

I’ve only briefly tested this out, so I recommend either performing the same in the test environment to see if it suits your use case and requirements.

Hope it helps!

Regards,
Kushagra

Andreas_Kroll · July 17, 2023, 2:49pm

Hi again,

thanks for the quick reply.
I think it was a step in the right direction.

However - I checked with explain, and it still tells collection scan
Even if I reduce the fields strictly to the metaInfos element.