How to sort custom fields using the Attribute Pattern

Brian_McCarthy · May 30, 2021, 12:44pm

I am trying to find a good way to be able to sort on a collection that uses the Attribute Pattern. I have a collection of “jobs” where each customer can have up to 500,000 jobs. Each customer can define custom fields they want to store for all of their jobs which I’ve stored using the attribute pattern (example shown below).

I want to be able to write a query which returns all jobs for a single customer where their favorite number is greater than or equal to 100, sorted by their favorite number.

I am planning on having an index on { "customerId": 1, "attr.k": 1, "attr.v": 1 }

My query to filter the records would be the following I think: { "cutomerId": 1, "attr": { $elemMatch: { "attr.k": "favoriteNumber", "attr.v": { $gte: 100 } } } }

My question is, how do I sort by the “favorite number” using this pattern? Is there a better representation of the data that would allow me to have custom fields per customer and sort/filter based on those custom fields?

Example structure of “jobs” collection so far:

{
    "jobId": 1,
    "customerId": 1,
    "attr": [
        { "k": "favoriteNumber", "v": 42 },
        { "k": "favoriteColor", "v": "green" }
    ]
}

[EDIT] - My initial idea - One idea I had to solve this was to store the attributes twice on the document like below, which would let me filter efficiently (I think) using the attribute pattern, and sort using the dictionary representation. I am looking to see if there is a better (or at least alternatives) because this approach stores duplicate data for each document which could add up significantly. Also, I’m not sure this sort would be able to use an index?

{
    "jobId": 1,
    "customerId": 1,
    "attr": [
        { "k": "favoriteNumber", "v": 42 },
        { "k": "favoriteColor", "v": "green" }
    ],
   "attrSort": {
      "favoriteNumber": 42,
      "favoriteColor": "green"
   }
}

Pavel_Duchovny · May 31, 2021, 5:21am

Hi @Brian_McCarthy

Welcome to MongoDB community.

The additional fields for sorting is a way I would go for, have you tried including each of those to the index and test? Make sure this is the last field in the index (order should be Equality Sort Range).

The alternative is probably to use aggregation and unwind the attributes after a match and sort in memory which might be expensive…

One thing I would avoid is a subdocument and just store a plain field like “favouriteNumber”.

Will you need only the colour and favourite number as sort options or any kind of attribute can be sortable?

Also does application need the entire document back or just the sorted values?

Thanks
Pavel

Brian_McCarthy · May 31, 2021, 1:58pm

Hi @Pavel_Duchovny,

A little more context - Each customer (2,000 + customers at the moment) will have their own attributes. Probably about ~8-10 per customer, and different per customer. So Customer A may have “favoriteColor”, “favoriteNumber” (and 6 others), and Customer B may have “leastFavoriteColor”, “leastFavoriteNumber” (and 6 different ones).

If I don’t know ahead of time what these “customer defined fields” (these 8-10 properties) are ahead of time, I don’t think I can add them to an index? I am still learning what is possible with mongodb so I am hoping there is a good solution to this.

Every query will only be for a single customer - not sure if that information helps come up with a solution.

Why do you suggest avoiding a sub-document and storing the attributes directly on the root? My concern here is that customers can choose their own attribute names and I don’t want those names to conflict with those on our current document.

To answer your questions:

The current ask, is that all customer defined fields (“favoriteNumber”, “leastFavoriteNumber”, “favoriteColor”, “leastFavoriteColor” in my example) be sortable + filterable.
The entire document needs to be returned and not just the sorted values. In my example, a customer will want to know all “Jobs” where the favorite color is “red”, sorted by favoriteNumber.

My example is kind of contrived so if the asks are not clear, let me know.

Thanks again!
Brian

Pavel_Duchovny · June 1, 2021, 5:47am

Hi @Brian_McCarthy,

I assume that in this case its hard to find an index to suite all conditions.

If you don’t know the names of the fields the attribute pattern indexing for filtering is probably the best here.

Now to sort the data you would probably need to add those additional fields and since you cannot add each of those to a predefined index you will sort them in memory.

If the sort is in memory and the sort fields are not included in the index it will not add another multi-key layer to the index so you can have them under a sub Document.

How much documents are expected to be filtered out by { "customerId": 1, "attr.k": 1, "attr.v": 1 } index?

If the numbers are low (hundreds to low thousands) the in-memory sort might not be such a big overhead.

Best regards,
Pavel

Brian_McCarthy · June 1, 2021, 12:26pm

Hi @Pavel_Duchovny,
Unfortunately, the number of items that can be included in that filter are probably in the few hundred thousand. I think a bigger issue may be that the user wishes to only filter on { “customerId”: 1 } and wants to sort ALL of their jobs by one of their “customer defined fields” (not sure what is the proper term for fields the customer adds to each document). This could result in as many as 500k results.

Are there any well defined patterns to solve this type of issue where there are fields for each customer that should be sorted but not known ahead of time? If it involves storing data in a different way, that is always an option too.

Thanks again,
brian

Pavel_Duchovny · June 1, 2021, 1:44pm

@Brian_McCarthy,

I am not aware of a magical way to dynamically index and use for sort on unknown set of fields.

A query can use the attributes index when sorting on the first attribute “v” field so maybe priorities which attribute is placed first.

Other than that you may need the in memory sort or lose the attributes patten and maybe cluster data into separate collection but its afar fetch…

Thanks
Pavel

Manuel_Reil · July 23, 2021, 8:40am

We have exactly the same problem. Customers can have 100k to millions of records with functionally free, but technical well designed schema. Requirements are free filtering and sorting. Attribute pattern is performing suprisingly well without filter. Once you add filtering, performance suffers massively (in memory sort).

Brian_McCarthy · July 23, 2021, 12:44pm

My understanding is that the attribute pattern is specifically for indexing these the “key” and “value” of the attributes so it would allow for efficient not in-memory filtering.

Having an index on { “attr.k”: 1, “attr.v”: 1 } would let you filter by favoriteNumber or favoriteColor.

    {
        "jobId": 1,
        "customerId": 1,
        "attr": [
            { "k": "favoriteNumber", "v": 42 },
            { "k": "favoriteColor", "v": "green" }
        ]
    }

Do you have a similar index and you still see performance issues?

My issue was specifically regarding sorting the results by ONE of the attributes.

Pedro_Lievana · February 7, 2022, 12:15pm

Hi @Brian_McCarthy we are in the same situation than you. Do you have any solution? I have repeated the question in Attribute Pattern - How to sort by some attribute value

Mario_Miniero · January 8, 2024, 10:41am

Hi, you find a solution? i have the same problem