Pymongo 4.0 - UpdateOne + array_filters + bulk_write never modifies matched documents

Hi everyone,

I ran across behavior that I think is a bug in pymongo, as running update_one with the same parameters in mongosh actually modifies the matched documents.
In my database I have objects of roughly this shape:

{
 'status': 'VERIFIED',
 'group': 0,
 ...,
 'assignments': [
  'assignment_id': 'LONG_STRING',
  'revieved': false
  ...
 ]
}

When I run the following piece of code, the bulk_api_result shows that a single document is indeed matched, but the modified_count is always 0:

def update_assignment_statuses_from_dict(assignment_id_status_dict: dict[str, AssignmentStatus]):
    if assignment_id_status_dict:
        update_operations = [UpdateOne(
            {'assignments': {'$exists': True}},
            {'$set': {'assignments.$[assig].status': status.value}},
            array_filters=[{'assig.assignment_id': assignment_id}],
            upsert=True
        ) for assignment_id, status in assignment_id_status_dict.items()]
        bulk_results = DB.get().pages.bulk_write(update_operations)

I’m aware that an existing field with the same value will not be modified, however none of my documents have the status field set.
Running essentially the same thing in mongosh does return a modifiedCount of 1:

db.pages_old.updateOne({'assignments': {'$exists': true}}, {'$set': {'assignments.$[assig].status': 'MANUALLY_VERIFIED'}}, {'arrayFilters': [{'assig.assignment_id': 'SOME_ID'}]})

It seems to me that the behavior is due to the usage of bulk_write, however I could be wrong and would like some insight into this problem.

I hope I’m not too tired to see a dumb typo or something… cheers!

Hey! Thank you for bringing up this bug report. I appreciate you detailing the structure of your documents, and your code. I have a few questions first about your configuration, please reply with all of the information detailed in the “How To Ask for Help” section of the README.
Could you additionally provide the complete results from the “bulk_results” variable when doing one of these UpdateOne operations?

1 Like

Here is the requested info:

python -c "import sys; print(sys.version)"
3.9.5 (default, Nov 18 2021, 16:00:48) 
[GCC 10.3.0]
python -c "import pymongo; print(pymongo.version); print(pymongo.has_c())"
4.0
True

System info: Linux Kubuntu 21.04 5.11.0-49-generic
From frameworks I’m using django==3.2.9
The DB is MongoDB 5.0.6 Enterprise - Free hosted 500MB version

Hi, thank you for that information. With Python 3.9.5, PyMongo 4.0, and MongoDB 5.0.6, I am not encountering the bug. One of the first things you are missing is curly braces around the elements of “assignments”–remember it is a list of dictionaries, not a list of key-value pairs. Fixing that, however, yields code that runs fine:

        self.coll.insert_many(([{
         "assignments": [{
         "assignment_id": "TEST" }]
        }, {
         "assignments": [{
         "assignment_id": "TEST2" }]
        }]))
        result = self.coll.bulk_write([UpdateOne({'assignments': {'$exists': True}}, {'$set': {
            'assignments.$['
                                                                           'assig].status':
                                                                               'MANUALLY_VERIFIED'}}, array_filters=[{'assig.assignment_id': 'TEST'}], upsert=True)])
        assert result.modified_count > 0
        print(result.modified_count)
        print(list(self.coll.find({})))
test_modified_ids (test.test_bulk.TestBulk) ... 1
[{'_id': ObjectId('6227dc6f730240932ee4192f'), 'assignments': [{'assignment_id': 'TEST', 'status': 'MANUALLY_VERIFIED'}]}, {'_id': ObjectId('6227dc6f730240932ee41930'), 'assignments': [{'assignment_id': 'TEST2'}]}]
ok

Does this fix your issue? If not, could I see a full example of one of your documents (with sensitive data scrubbed)? There is also the possiblity that there is a difference between pages_old and pages that you do not realize, that is yielding this difference in behavior. What happens when you run bulk_write on pages_old? Another reason that I suspect this might not be a bug in PyMongo is because our bulk_write API basically passes most of the arguments completely unchanged. As you can see from the source code for add_update, there is nothing done to the arrayfilters or update document.

    def add_update(
        self,
        selector,
        update,
        multi=False,
        upsert=False,
        collation=None,
        array_filters=None,
        hint=None,
    ):
        """Create an update document and add it to the list of ops."""
        validate_ok_for_update(update)
        cmd = SON([("q", selector), ("u", update), ("multi", multi), ("upsert", upsert)])
        collation = validate_collation_or_none(collation)
        ...
        if array_filters is not None:
            self.uses_array_filters = True
            cmd["arrayFilters"] = array_filters
        ...
        self.ops.append((_UPDATE, cmd))
1 Like

Sorry for taking long to answer, but the code you posted has a weird formatting error. The selector is cut into two strings and doesn’t make sense. Also I don’t see where I’m missing curly braces. Your code, at least the valid parts, seem identical to mine.

Edit: I’ve even reconstructed your answer and ran it trough a diff tool. Our python code is identical

Edit2: I’ve played around with the query and it seems like I have to add an _id of the page in order for it to be updated. To me it really seems like a bug in the logic.

For clarity, here is a full page document:

{
  "_id": "012dbb1c2a08407eb74f82a3feccaef0-03",
  "status": "VERIFIED",
  "pdf_id": "012dbb1c2a08407eb74f82a3feccaef0",
  "group": 0,
  "HIT_ids": [
    "REDACTED"
  ],
  "published": [
    {
      "$date": "2022-03-19T21:22:03Z"
    }
  ],
  "assignments": [
    {
      "assignment_id": "REDACTED",
      "worker_id": "REDACTED",
      "HIT_id": "REDACTED",
      "auto_approval_time": {
        "$date": "2022-03-21T21:36:34Z"
      },
      "submit_time": {
        "$date": "2022-03-19T21:36:34Z"
      },
      "reviewed": true,
      "environment": "production",
      "answer": {
        "appVersion": "0.1.3",
        "secondCounter": 52,
        "canvasHeight": 1754,
        "canvasWidth": 1241,
        "annotations": [],
        "comment": "Second 2-assignment paid batch"
      },
      "status": "MANUALLY_ACCEPTED"
    },
    {
      "assignment_id": "REDACTED",
      "worker_id": "REDACTED",
      "HIT_id": "REDACTED",
      "auto_approval_time": {
        "$date": "2022-03-21T21:40:22Z"
      },
      "submit_time": {
        "$date": "2022-03-19T21:40:22Z"
      },
      "reviewed": true,
      "environment": "production",
      "answer": {
        "appVersion": "0.1.3",
        "secondCounter": 29,
        "canvasHeight": 1754,
        "canvasWidth": 1241,
        "annotations": [],
        "comment": "Second 2-assignment paid batch"
      },
      "status": "MANUALLY_ACCEPTED"
    }
  ],
  "accepted_assignment_id": "REDACTED"
}

Hi,

Thanks for the full document. So I was able to replicate and then solve your error. Basically the issue is that your selector is {'assignments': {'$exists': True}} which will simply match the first item in the collection that has an assignments field. However, because it is an UpdateONE it stops after the first match. Thus, it was matching but then not modifying the first document, and then it stopped after that yielding zero modified documents. I was able to recreate the bug by changing the ordering of my documents in my reproduction–prior I was unable to reproduce the bug because my first document and my modified document were the same, but if they are not–as in your case–then you will encounter the bug. If you change your selector to be {'assignments.assignment_id': assignment_id} then it should do what you want it to do. I believe this is also why it works when you manually specify the id–in both cases it is not using the old selector which is the cause of the error.

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.