Unique index with partial filter type string index in array of object throws duplicate key error

I have a field members an array of object,

db.getCollection('collection').insert({
  "members": [
    { "userId": "abc1" },
    { "userId": null }
  ]
})

I have created a unique + partial index (expression type should be string),

db.getCollection('collection').createIndex(
  { "members.userId": 1 },
  {
    "unique" : true,
    "partialFilterExpression" : {
      "members.userId" : {
        "$type" : "string"
      }
    }
  }
)

When i insert a new document with a null value:

db.getCollection('collection').insert({
  "members": [
    { "userId": "abc2" },
    { "userId": null }
  ]
})

It throws a duplicate key error,

E11000 duplicate key error collection: sample.collection index: members.userId_1 dup key: { members.userId: null }

I think it should work because we have added a partial filter type by string,

Is there any other option to do this scenario?

What about a partialFilterExpression that exclude null in addition to $type:“string”?

Something that looks like

"members.userId" : { "$ne" : null }

Since both conditions refer to members.userId you probably need to use $and.

Actually, $ne is not a supportive operator for partial index expression, see allowed expressions, that is why i am using $type: "string" expression condition.

1 Like

Hi @turivishal

I think you’re hitting the limitations of partial index filter as described in SERVER-17853 where it mentions that some features like $elemMatch are not yet supported (which I think is a requirement for your use case).

For a small demonstration, I created an index exactly like yours:

> db.coll.createIndex({'members.userId':1}, 
  {unique:true, 
   partialFilterExpression: {'members.userId': {$type: 'string'}}})
members.userId_1

Then I inserted a document with a string and a number as members.userId:

> db.coll.insertOne({members:{userId:'abc'}})
> db.coll.insertOne({members:{userId:123}})

So the first document should be in the index, while the second document should not:

> db.coll.find().hint('members.userId_1')
[
  {
    _id: ObjectId("60d180f635b7b416ce4312a5"),
    members: { userId: 'abc' }
  }
]

The hint() forces MongoDB to use that index, which shows that the first document does exist in the index, while the second one doesn’t. In this case, it behaves as expected.

However, once we change members into an array, it behaves a bit differently:

> db.coll.insertOne({members:[{userId:'def'}, {userId:123}]})
> db.coll.insertOne({members:[{userId:'ghi'}, {userId:123}]})
MongoError: E11000 duplicate key error collection: test.coll index: members.userId_1 dup key: { members.userId: 123 }

So it’s complaining about a duplicate key of 123 which technically should not be in the index, but it actually is:

> db.coll.find({'members.userId':123}).hint('members.userId_1')
[
  {
    _id: ObjectId("60d1811835b7b416ce4312a8"),
    members: [ { userId: 'def' }, { userId: 123 } ]
  }
]

This is because I think you need $elemMatch for the partial filter expression (since it involves an array, and you’re matching one element of the array), where it’s not supported yet.

As a workaround, my suggestion is to not use an array for now (until SERVER-17853 is resolved). Having said that, if this feature is important for your workflow, please comment/vote on the aforementioned SERVER-17853 describing your use case.

Best regards
Kevin

4 Likes

I did that, and thank you for your explanation.

1 Like

For now, the sparse index will work if i don’t include null property in the document, because it will exclude not exists field from index,

Create Index:

db.getCollection('collection').createIndex(
  { "members.userId": 1 },
  {
    "unique" : true,
    "sparse" : true
  }
)

Insert Documents:

db.getCollection('collection').insert({
  "members": [{ "userId": "abc1" }]
})
db.getCollection('collection').insert({
  "members": [{ "userId": "abc2" }, { "otherProperty": "123" }]
})
2 Likes

I think there is some semantic ambiguity in what partialFilterExpression does.

The docs say:

Partial indexes only index the documents in a collection that meet a specified filter expression.

That does not mean that it only indexes the entries in the document that match the filter expression.

So if the document has any element that is of type string, then it will match the partial filter expression and it will be indexed fully (meaning all the array elements will be indexed). Now, this might not be what you are expecting or want to implement but that’s how it currently behaves.

I’m glad the sparse attribute solves your problem.

Asya Kamsky

2 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.