Access one specific embedded document using MongoEngine

I have a database with the following schema:

[
    image: {
        image_name: str,
        date: date,
        labels: [
            label: {
                label_name: str,
                version: int,
                features: [
                    feature: {
                        feature_name: str,
                        geometry: Polygon,
                    },
                    ...
                ]
            },
            ...
        ]
    },
    ...
]

I am writing an API for this database, and I am trying out MongoEngine. I define my documents as follows:

class Feature(EmbeddedDocument):
    feature_name = StringField(required=True)
    geometry = PolygonField(required=True)
    id = ObjectIdField(required=True, default=ObjectId, unique=True, primary_key=True)

class Label(EmbeddedDocument):
    label_name = StringField(required=True)
    version = IntField(required=True)
    features = EmbeddedDocumentListField(Feature)
    id = ObjectIdField(required=True, default=ObjectId, unique=True, primary_key=True)

class Image(Document):
    image_name = StringField(required=True)
    date = DateTimeField(required=True)
    labels = EmbeddedDocumentListField(Label)

I would like to access a specific embedded document.
I can access a top-level document this way:

image = next(Image.objects(id=image_id))

But this doesn’t work for embedded documents:

label = next(Label.objects(id=label_id))  # AttributeError: type object 'Label' has no attribute 'objects'

Prior to using MongoEngine, I was using the following code with PyMongo:

# PyMongo code to get label based on label_id
pipeline = [
    {"$unwind": "$labels"},
    {"$match": {"labels._id": label_id}},
    {"$project": {"label": "$labels"}},
]
label_dict = next(cursor)["label"]  # returns a dict
label = Label.from_dict(label_dict)

# PyMongo code to get feature based on feature_id
pipeline = [
    {"$unwind": "$labels"},
    {"$unwind": "$labels.features"},
    {"$match": {"labels.features._id": feature_id}},
    {"$project": {"feature": "$labels.features"}},
]
cursor = self.db.images.aggregate(pipeline)
feature_dict = next(cursor)["feature"]  # returns a dict
feature = Feature.from_dict(feature_dict)

What would be the best solution using MongoEngine?

Many thanks in advance for your help!

Thibaut

I think I found a (partial) solution to my problem, after reading this SO answer.

I can simply call objects.get() recursively starting from the top-level document.

For example, considering my schema, I can query a given label (embedded-document) with:

label = Image.objects.get(id=image_id).labels.get(id=label_id)

And I can query a given feature (nested embedded-document) with:

feature = Image.objects.get(id=image_id).labels.get(id=label_id).features.get(id=feature_id)

But these queries require the parent documents IDs.

I can obtain a parent image ID based on a given label ID with objects.filter().

parent_image = Image.objects.filter(labels__id=label_id)[0]

I can also obtain a parent image ID based on a given feature ID:

parent_image = Image.objects.filter(labels__features__id=feature_id)[0]

But I can’t seem to obtain the parent label ID based on the feature ID:

parent_label = Image.objects.get(id=parent_image.id).labels.filter(features__id=feat_id)[0]
# Raises "AttributeError: 'Label' object has no attribute 'features__id'"

You can find the Image document that contains your label like this:

image = Image.objects(labels__id=label_id).first()

See 2.5. Querying the database — MongoEngine 0.27.0 documentation

You can also use the aggregation API directly as described here: 2.5. Querying the database — MongoEngine 0.27.0 documentation

I will also say it is unusual to add in _id on EmbeddedDocuments. An _id usually lives only at the top level document (or as a reference to another document, eg a ReferenceField).

Hi Shane and thanks a lot for your answer.

However I think I’m still a bit confused.

The syntax you suggested allow to find the parent image from the database, based on the label ID. But I’m looking for the embedded label itself. Are you suggesting that I then find the embedded label in a second step, with something like this?

label = next((label for label in image.labels if label._id==label_id), None)

My first intuition was to query the embedded label directly from the database, with something like:

label = Image.objects.get(id=image_id).labels.get(id=label_id)

This query indeed requires to know the parent image ID, which I can obtain in a previous step with something like:

image_id = Image.objects(labels__id=label_id).first().id

Does this solution seem wrong to you?

Thanks also for your remark on using _id on EmbeddedDocuments. I must admit that I can’t get my head around not using these IDs. I feel like they are very useful to identify a given embedded document without relying on all its unique fields. But this probably shows that I haven’t yet fully understood how to work with document-oriented databases.

You can use the positional $ operator in a projection to get only the matching array element. For example:

class MyDoc(Document):
    a = ListField()
    c = ListField()

MyDoc(a=[{"b": 2}, {"b": 3}], c=[1, 2, 3]).save()

# https://www.mongodb.com/docs/manual/reference/operator/projection/positional/
doc = MyDoc.objects(a__b=2).fields(**{"a.$": 1}).first()
assert doc.a == [{"b": 2}]
assert doc.c == []