Nested array operation for $all does not seem to work in mongodb

Arkensor · June 8, 2023, 5:33pm

I am trying to find documents where any or all nested array elements must contain at least the values provided. This can normally be achieved easily with the $all operator. For example here:

[
  {
    "words": [ "Hello", "World", "!"],
    "wordArrays": [
      ["Hello", "World", "!"],
      ["The", "Sun", "Shines"]
    ]
  }
]

The query {words: {$all: ["Hello", "World"]}} correctly matches because both are contained and {words: {$all: ["Hello", "I do not exist"]}} does not match. So far so good.

If you attempt the same for the nested arrays this stops working altogether. {"wordArrays.0": {$all: ["Hello", "World"]}} does not match. I have found similar questions why on nested arrays all the operators start to fail but no one ever can explain why and they just use $elemMatch. But I do not know how I would translate my “all values must be in the array” to an $elemMatch. And then how I would say

ANY of the wordArrays must contain “Hello” and “World” → would match
ALL of the wordArrays must contain “Hello” and “World” → would not match the data

I have prepared this little playground with the data in the hopes somebody knows how to achieve this.
Playground

I would prefer if this could be solved all within the Find() stage, but if operations like this suddenly require the aggregation pipeline I am fine with that too.

My approach for ANY match does not return any matches, and I do not understand why {wordArrays: {$elemMatch: {$all: ["Hello", "World"]}}}. I read it as wordArrays: does any element match: array contains “Hello” and “World”. And that should have matched the document

Thank you!

Aasawari · October 18, 2023, 6:27am

Hi @Arkensor and welcome to MongoDB community forums!!

I apologise for getting back to you so late.

The find query would not work in the above mentioned format. However, if your sample document looks like:

Atlas atlas-cihc7e-shard-0 [primary] test> db.server.find()
[
  {
    _id: ObjectId("652e5a208c360b950bb50eea"),
    word: 'Hello',
    words: [ 'Hello', 'World', '!' ],
    wordArrays: [ [ 'Hello', 'World', '!' ], [ 'The', 'Sun', 'Shines' ] ]
  }
]

the query:

Atlas atlas-cihc7e-shard-0 [primary] test> db.server.find( {“wordArrays”: { “$all”: [[“Hello”,“World”,“!”]] } })
[
{
_id: ObjectId(“652e5a208c360b950bb50eea”),
word: ‘Hello’,
words: [ ‘Hello’, ‘World’, ‘!’ ],
wordArrays: [ [ ‘Hello’, ‘World’, ‘!’ ], [ ‘The’, ‘Sun’, ‘Shines’ ] ]
}
]
would give the result as the ‘[“Hello”,“World”,“!”]’ matches the complete element at index 0.

If you wish to use $elematch to match for a specific element in the array, you can use the query as

Atlas atlas-cihc7e-shard-0 [primary] test> db.server.find({wordArrays: {$elemMatch: {$all: [["Hello", "World", "!"]]}}})
[
  {
    _id: ObjectId("652e5a208c360b950bb50eea"),
    word: 'Hello',
    words: [ 'Hello', 'World', '!' ],
    wordArrays: [ [ 'Hello', 'World', '!' ], [ 'The', 'Sun', 'Shines' ] ]
  }
]

If you wish to use aggregations, you can make use of $arrayElemAt to project element at a specific index value.

Also, could you clarify with sample documents and expected response which explains the below statements.

Please feel free to reach out in case of any further questions.

Regards
Aasawari

Arkensor · October 28, 2023, 10:10am

Hello @Aasawari,

thank you for getting back to me on this. To clarify a few things:

The first example was intentionally about checking the 0th item. I expect this code to work as it reads as follows: On the document go into “wordArrays” property. It is an array and the 0th item shall be accessed. On data data present on the 0th item it shall be checked that: ALL the values [“Hello”, “World”] are present on the data. If so return true true. False otherwise.

{"wordArrays.0": {$all: ["Hello", "World"]}}

The reason why I say it should work like this is because on the example playground, I linked it works if I have a non-nested array.

db.collection.find({ words: {"$all": ["Hello","World"]} })

So my expectation as user is that if I manually navigate the field accessor for the nested array to the 0th item, the data he inspects is the same as if I ask the query to do it on a field that only has this one array as data.
I would consider this a bug / lack of feature support. Which is why I opened https://jira.mongodb.org/browse/SERVER-77974

That is true but not what I was asking about here. That is an EXACT match on ANY of the items to find find the index. My first example was about already having the index and wanting to validate if I find the required data there or not.

As for the second part of my question, this was about achieving my overall goal. The issue raised over “array.INDEX” query not working as expected was just a finding I had while trying to get there.

My goal is exactly what I describe there. I have multiple documents. They each contain the “wordArrays” field which holds multiple arrays. I now want to check what I wrote on all the documents.

So for the 1. one given the data below, I expect the query I am looking for to match since there was one nested array that contained both “Hello” and “World”. On the same data if I looked for “Not” and “Exists” it should not match because none of the arrays contained both these strings.

[{"wordArrays": [[ "Hello","World","!"],["The","Sun","Shines"]]}]

For the 2. query I want to not find just one matching nested array, but I want to validate that ALL the nested arrays must contain both the values. Given the same data example as used above this should fail, because “Hello” and “World” are present in the 0th array, but the 1st contains neither of the words.

Writing these queries is easy if you just inspect one array that is not nested. I however have data that I can not and do not want to change in any way and want to query the nested arrays to do these ANY and ALL operations on them. I am looking for those two examples because from those two working queries, I should be able to derive the other variants of any/all nested arrays inspected for any/all item matches.

Aasawari · November 8, 2023, 6:36am

Hi @Arkensor thank you for the details

My understanding is that using the dot notation references the specific element at the array index as opposed to the contents of the specified element at the array index. If we combine this with the example noted in the behaviour portion of the $all operator documentation, we can say that the query you mentioned:

db.collection.find({
   "wordArrays.0":{
      "$all":[
         "Hello",
         "World"
      ]
   }
})

Is equivalent to:

db.collection.find({
   "$and":[
      {
         "wordArrays.0":"Hello"
      },
      {
         "wordArrays.0":"World"
      }
   ]
})

In saying the above, we can see the element in the array "wordArray" at index 0 for the is [ "Hello","World","!"]. Neither the string "Hello" or "World" match this so nothing is returned. The use of dot notation here does not “access” the array as you have mentioned. I.e. in this example, use of the dot not notation will not look into the contents of this sub-array to determine if it contains the strings and instead will just check if the specified element is equal to the strings "Hello" and "World".

From the above example and the documentation, I believe this works as the expected. However, i understand this may not be the behavour you expect so this may be better suited as a feedback post as opposed to SERVER ticket. Please raise a feedback request in the MongoDB Feedback Engine or upvote for an existing/similar request if present.

If I understand correctly, you are looking for a query which would give all the documents which contains [‘hello’,‘world’] irrespective of the index position.
If your wordsArray size is small, you can use a query like:

Atlas atlas-cihc7e-shard-0 [primary] test> db.testall.find( {$or: [{'wordsArray.0': ['hello', 'world']}, {'wordsArray.1': ['hello', 'world']}, {'wordsArray.2': ['hello', 'world']} ] })
[
  {
    _id: ObjectId("654b212957b6dc078e3deea5"),
    words: [ 'hello', 'world', '1' ],
    wordsArray: [
      [ 'the', 'sum', 'flower' ],
      [ 'not', 'exists' ],
      [ 'hello', 'world' ]
    ]
  }
]

where in my sample data, [‘hello’,‘world’] is at the 2nd index.

I request your assistance in providing a sample dataset and outlining the anticipated response if the provided data does not align with your requirements.

Consider the sample data contains:

Atlas atlas-cihc7e-shard-0 [primary] test> db.testall.find()
[
  {
    _id: ObjectId("654b206d57b6dc078e3deea4"),
    words: [ 'hello', 'world', '1' ],
    wordsArray: [ [ 'hello', 'world', '1' ], [ 'the', 'sum', 'flower' ] ]
  },
  {
    _id: ObjectId("654b212957b6dc078e3deea5"),
    words: [ 'hello', 'world', '1' ],
    wordsArray: [
      [ 'the', 'sum', 'flower' ],
      [ 'not', 'exists' ],
      [ 'hello', 'world' ]
    ]
  },
  {
    _id: ObjectId("654b26d457b6dc078e3deea6"),
    words: [ 'hello', 'world', '1' ],
    wordsArray: [ [ 'hello', 'world' ], [ 'hello', 'world' ] ]
  }
]

If you use the query with $and, the below query would return.

Atlas atlas-cihc7e-shard-0 [primary] test> db.testall.find( {$and: [{'wordsArray.0': ['hello', 'world']}, {'wordsArray.1': ['hello', 'world']} ] })
[
  {
    _id: ObjectId("654b26d457b6dc078e3deea6"),
    words: [ 'hello', 'world', '1' ],
    wordsArray: [ [ 'hello', 'world' ], [ 'hello', 'world' ] ]
  }
]

Could you outline the particular reason or use case for this schema design?

Regards
Aasawari

Arkensor · November 8, 2023, 3:16pm

Hello again @Aasawari,

that explanation is rather helpful. I can see now how the index accessor syntax together with $all was not what I expected. So that is fair enough.

About the other query. Yes if it was only a few items it could be hard coded with indices, however, my data could have 0…N amount of wordArrays where each array can have 0…N number elements.

I am not directly in control of the data. I am writing a translation layer of user-defined queries into MongoDB, hence I can not make any assumptions on the data. I just know it is an array of arrays of strings and to find matches where N user-provided strings are present. I have some ugly solution with using $map to flatten the nested arrays into booleans if they contain all search words or not, and then check if the outer array contained “true” anywhere, but I am not sure if there could maybe be a nicer option.

The data I have given you is exactly my use case. There is no other data that I can give you to make this any more clear.

[
  {
    _id: ObjectId("654b212957b6dc078e3deea5"),
    words: [ 'hello', 'world', '1' ],
    wordsArray: [
      [ 'the', 'sum', 'flower' ],
      [ 'not', 'exists' ],
      [ 'justoneword' ],
      [ 'many', 'many', 'many', 'many', 'wordsinonearray' ],
      <<<TRUNCATED - 500 MORE ARRAY ITEMS HERE >>>,
      [ 'hello', 'world' ]
    ]
  }
]

Thank you very much