Multiple criteria search / issue query targeting

Hello there,

Developing my first big project using MongoDB, I have multiple issues with query targeting.

To sum up my project. I have a DB of a huge amount of vehicles (+80k), and I want the customer to be able to query a specific vehicle with multiple filter that are all optionals (make, model, maxMileage, minMileage, minPrice, maxPrice, bodyColor, …)

Currently, I am creating indexes with the Performance Advisor on Atlas, since I have a lot of critea to filter, it’s not working in 100% of cases.

I read the documentation about indexes, but I find it very difficult to understand. Do you have a use-case of queries where you have multiple criterias to filter your queries and which are all optionals?

Thanks for you help!

Hi @Robin_J welcome to the community!

Perhaps you’re looking for the Attribute pattern?

The tldr is, instead of doing this:

{
  _id: 0,
  make: 'xxx',
  model: 'yyy',
  mileage: 1234,
  ...
}

you do this:

{
  _id: 0,
  attributes: [
    {key: 'make', value: 'xxx'},
    {key: 'model', value: 'yyy'},
    {key: 'mileage', value: 1234},
    ...
  ]
}

Then you can create an index on the attribute field and index all the contents of that field efficiently. Please see the linked page for more details. The polymorphic pattern is another that you may find interesting as well.

Note that this is just an idea off the top of my head. It’s best to test this schema’s performance and usability according to your specific use case.

Best regards
Kevin

Hello @kevinadi,

Thanks for your help, it was a great idea, it works very good!
However, I have a problem to filter using $gte and $lte for example on key like “maxPrice”, “minPrice”.
All “value” are integer when key is “maxPrice” or “minPrice”, is not working because other values are not integer, for example “model” ?

Thanks !

If you need to search for something with a specific value range in an array of sub-documents, you should use $elemMatch.

For example:

> db.test.find()
[
  {
    _id: 0,
    attributes: [
      { k: 'make', v: 'cheapcar' },
      { k: 'price', v: 1000 },
      { k: 'mileage', v: 10000 }
    ]
  },
  {
    _id: 1,
    attributes: [
      { k: 'make', v: 'expensivecar' },
      { k: 'price', v: 10000 },
      { k: 'mileage', v: 12000 }
    ]
  }
]

Finding cars where the price is greater than 5000:

> db.test.find({attributes: {$elemMatch: {k: 'price', v: {$gt:5000}}}})
[
  {
    _id: 1,
    attributes: [
      { k: 'make', v: 'expensivecar' },
      { k: 'price', v: 10000 },
      { k: 'mileage', v: 12000 }
    ]
  }
]

There are more interesting examples in the page Query an Array of Embedded Documents.

Is this the method you need? If not, could you please provide some example data & the desired output?

Best regards
Kevin

Yes sure, this is an example of one of my document :

_id: 1,
...
attributes: [
  { key: 'totalPrice', value: 40600 },
  { key: 'mileage', value: 67953 },
  { key: 'firstRegistrationDate', value: 2019 },
  { key: 'make', value: 9 },
  { key: 'model', value: 19155 },
  { key: 'hp', value: 190 },
  { key: 'numberOfSeats', value: 5 },
  {
    key: 'modelVersionInput',
    value: '40 TDI Q SPORT LM20 eSITZE KAMERA AHK ACC '
  },
  { key: 'fuelCategory', value: 1 },
  { key: 'transmissionType', value: 1 },
  { key: 'numberOfDoors', value: 5 },
  { key: 'bodyType', value: 3 },
  { key: 'bodyColor', value: 10 },
  {
    key: 'equipments',
    value: [4, 50, 20, 130, 187, 140, 239, 38, 133, 158, 139, 6, 34, 221, 23, 124, 224, 157, 153, 11],
  }
],

Make, model, fuelCategory, transmissionType, bodyType, bodyColor, equipments values are ids FYI.

The customer has to be able to query vehicles with as much criteria he wants. For example, something like this :

{
  $and: [
    { "attributes.key": "make", "attributes.value": 9 },
    { 
      $or: [
        { "attributes.key": "model", "attributes.value": 19155 },
        { "attributes.key": "model", "attributes.value": 19156 },
        { "attributes.key": "model", "attributes.value": 19157 },
      ] 
    },
    { attributes: { key: "totalPrice", value: { $lte: 50000 } } },
    { 
      $or: [
        { "attributes.key": "bodyColor", "attributes.value": 1 },
        { "attributes.key": "bodyColor", "attributes.value": 2 },
        { "attributes.key": "bodyColor", "attributes.value": 10 },
      ] 
    },
    { "attributes.key": "equipments", "attributes.value": 4 },
    { "attributes.key": "equipments", "attributes.value": 50 },
    { "attributes.key": "equipments", "attributes.value": 20 },
    { "attributes.key": "equipments", "attributes.value": 130 },
  ]
}

Moreover, if possible I would like to sort these results, by totalPrice asc and desc, mileage, firstRegistrationDate…

Regards,

Actually, this solution is working, but is very long to proceed sometimes (more than 10 sec, and I limit the result to 12 out of 80k)

Hi @Robin_J

Although the attribute pattern is a good pattern when you can’t guarantee that all documents follow a certain schema, it is not a silver bullet unfortunately.

From Building with Attribute Pattern, the Attribute Pattern is particularly well suited when:

  • We have big documents with many similar fields but there is a subset of fields that share common characteristics and we want to sort or query on that subset of fields, or
  • The fields we need to sort on are only found in a small subset of documents, or
  • Both of the above conditions are met within the documents.

The attribute pattern allows you to minimize the number of index you need to create, but if this approach is not performant for your use case, then you might need to combine this with other patterns. Here’s a good article: Building with Patterns: A Summary

The key to better performance is to check the explain plan output that can show you how the query planner plans to answer the query, and if you run db.collection.explain('executionStats').find(....) it will execute the plan and show you how much time it spends on each stage of the plan, along with how many documents/index keys are scanned and returned. Ideally you want as few documents/index keys scanned vs. returned documents, which means that your query is very targeted, and your indexes are working well.

This design step, however, is very use case specific, so the solutions are very customized for your use case. The same schema & index design would not necessarily work well with the same document structure but with different query patterns.

There’s a MongoDB University course that specializes in this: M320 Data Modeling, so it may be useful for you.

A note about the $or operator though: it works differently compared to other operators, and indexes may need to be optimized specifically with the use of $or in mind. Otherwise you can get a big hit in performance. Please see $or Clauses and Indexes.

Best regards
Kevin

1 Like

With the sample query you shared, I want to emphasis the need to use $elemMatch as expressed by @kevinadi to make your query working correctly.

Your query will match too many documents. All document with attributes.key:make and any of the values 9, 19155, 1, 2, 4 will also be matched, which is not what you intend to do.

/* the whole collection */
mongosh> ap.find()
> { _id: 1,
  attributes: [ { k: 'make', v: 19155 }, { k: 'model', v: 9 } ] }
> { _id: 2,
  attributes: [ { k: 'make', v: 9 }, { k: 'model', v: 19155 } ] }

/* the wrong query that matches more documents that is intended */
mongosh> ap.find( { "attributes.k" : "make" , "attributes.v" : 9 })
> { _id: 1,
  attributes: [ { k: 'make', v: 19155 }, { k: 'model', v: 9 } ] }
> { _id: 2,
  attributes: [ { k: 'make', v: 9 }, { k: 'model', v: 19155 } ] }

/* the correct query that only matches the correct make */
mongosh> ap.find( { "attributes" : { "$elemMatch" : { "k" : "make" , "v" : 9 }} })
> { _id: 2,
  attributes: [ { k: 'make', v: 9 }, { k: 'model', v: 19155 } ] }

Thank you @kevinadi and @steevej for your help.
It is much appreciated.

I checked the $elemMatch solution, unfortunately the results were still very slow to come.

The best solution, unfortunately, was to create a SQL DB only with the values I need for queries and the _id of MongoDB.
With the new approach, I have results in around 100ms against 10 to 20 seconds for the previous one.

I kept MongoDB to store all other data since data structure is way better than with SQL databases.

Thanks !

Hi @Robin_J glad you have found a workable solution!

If you don’t mind describing your final solution, it will be greatly appreciated. I’m wondering if this is a use case that can be improved, either with a new design pattern, or something else.

Thanks!
Kevin

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.