UUID representation mismatch with pymongo 4.0.1

I’m upgrading from pymongo 3.12 to pymongo 4.0.1, and I’m running into some weird behaviour with the UUID representation. The database connection is configured with CodecOptions(uuid_representation=4).

There is a difference between return values when using normal insertion and when using bulk insertion:

# bulk gives binary representation
doc = {'_id': uuid.uuid4()}
result = collection.bulk_write([ReplaceOne(doc, doc, upsert=True)])
result.inserted_ids
> {0: Binary(b'\xb6\r%\xf2\xadOMo\x919\xc4\x98P\xe5\n\x02', 4)}
# normal insertion gives UUID representation
result = collection.insert_one({'_id': uuid.uuid4()})
result.inserted_id
> UUID('1ec97f53-b1ae-471b-9e0c-157ca9b02953')

I was wondering if this is by design and if there is a way to make the bulk result also return the python UUID representation.

Thanks for reporting this issue, it is indeed a bug. I’ve opened a bug report here: https://jira.mongodb.org/browse/PYTHON-3075

Note that PyMongo is still sending the expected UUID format to the server, it’s just the client side upserted_ids field which is unexpected. A temporary workaround for this bug would be to convert the Binary into the expected type:

import uuid
from typing import Any, Dict, Mapping
from bson.binary import Binary, UuidRepresentation
from pymongo import MongoClient
from pymongo.collection import Collection
from pymongo.operations import ReplaceOne

def convert_ids(upserted_ids: Mapping[int, Any], coll: Collection) -> Dict[int, Any]:
    """Temporary workaround for https://jira.mongodb.org/browse/PYTHON-3075.

    Use like this::

        result = collection.bulk_write([...])
        upserted_ids = convert_ids(result.upserted_ids, collection)

    """
    res = {}
    rep = coll.codec_options.uuid_representation
    for idx, _id in upserted_ids.items():
        if rep == UuidRepresentation.UNSPECIFIED:
            if isinstance(_id, uuid.UUID):
                _id = Binary.from_uuid(_id)
        elif isinstance(_id, Binary):
            _id = _id.as_uuid(rep)
        res[idx] = _id
    return res

client = MongoClient(uuidRepresentation='standard')
collection = client.test.test
doc = {'_id': uuid.uuid4()}
result = collection.bulk_write([ReplaceOne(doc, doc, upsert=True)])
for _id in convert_ids(result.upserted_ids, collection).values():
    assert isinstance(_id, uuid.UUID)

Please follow the jira ticket for updates.

One question though, are you using pymongo directly or with a wrapper library? I ask because BulkWriteResult doesn’t have an inserted_ids property (it does have an upserted_ids property).

2 Likes

Thank you for the workaround and the bug report! You’re right it was upserted_ids, I apparently did something weird with the copy pasting and formatting to make it look nice…