Paginate over chunked data of different size

NeNaD · October 19, 2022, 7:31am

I have a bit of a tricky use case and would love to hear suggestions from the community on the best way to do it.

So, I have 2 collections:

Companies

[
  {
    "_id": 1,
    "priority": 1,
    "number_of_products_to_display": 3
  },
  {
    "_id": 2,
    "priority": 2,
    "number_of_products_to_display": 4
  },
  {
    "_id": 3,
    "priority": 3,
    "number_of_products_to_display": 7
  }
 ]

Products

[
  {
    "_id": 1,
    "company_id": 1
  },
  {
   "_id": 2,
    "company_id": 1
  },
  {
    "_id": 3,
    "company_id": 2
  },
  {
   "_id": 4,
    "company_id": 3
  },
  {
    "_id": 5,
    "company_id": 2
  }
]

I want to paginate over the products in chucks of 10, but based on the company priority and number_of_products_to_display.

For example:

First chuck of 10 should include:

latest 3 products from the first company (since it has the highest priority and number_of_products_to_display equal to 3).
latest 2 products from the second company
latest 5 products from the third company

Second chuck of 10 should include:

latest 2 products from the third company (because they were not returned in the first chuck)
second latest 3 products from the first company (we are going back to first company since we already covered all the companies)
second latest 2 products from the second company
second latest 3 products from the third company

We should circle over all companies in this manner until all the products are fetch.

All suggestions are welcomed!

Pavel_Duchovny · October 19, 2022, 11:15am

Hi @NeNaD ,

So running such a logic in one aggregation will not be the most efficient way.

The efficient way is to get the first 10 companies based on priority

db.companies.find({}).sort({priority :  -1})

And then get the relevant 10 documents from the product:

db.products.aggregate([{
 $match: {
  company_id: <HIGH_PRIORITY_ID>
 }
}, {
 $limit: <CORRESPONDING_LIMIT>
}, {
 $unionWith: {
  coll: 'products',
  pipeline: [
   {
    $match: {
     company_id: <NEXT_HIGH_PRIORITY_ID>
    }
   },
   {
    $limit: <CORRESPONDING_LIMIT>
   }
...
// The next N companies

  ]
 }},
{ $skip : 0 },
{ $limit : 10}
])

Now if you want to still try a one go show here is the agg , but its complex and not super efficient:

db.companies.aggregate(
[{
 $sort: {
  priority: -1
 }
}, {
 $lookup: {
  from: 'products',
  localField: '_id',
  foreignField: 'company_id',
  as: 'products'
 }
}, {
 $addFields: {
  products: {
   $slice: [
    '$products',
    '$number_of_products_to_display'
   ]
  }
 }
}, {
 $unwind: {
  path: '$products'
 }
}, {
 $skip: 0
}, {
 $limit: 10
}]);

Thanks
Pavel

NeNaD · October 19, 2022, 12:46pm

@Pavel_Duchovny Ingenious!

I am working with MongoDB for a long time and used $unionWith many times, but I didn’t know you can use $unionWith on the same collection you are doing the aggregation!

The only thing that is missing here is the option to iterate over in the next cycle once all the company products are fetched at least once.

So for example, let’s say I have 2 companies like this:

[
  {
    "_id": 1,
    "priority": 1,
    "number_of_products_to_display": 7
  },
  {
    "_id": 2,
    "priority": 2,
    "number_of_products_to_display": 8
  }
 ]

With you current solution, the whole aggregate will stop in the second iteration and it will return only 5 results (all five from the second company).

What I would like to happen is that the cycle would continue, where the second aggregate would also return 10 results (5 from second company and 5 from first company again).

Any idea how to add on top of your current answer that would cover that as well?

Btw, I am referencing your first solution with company data prefetched before the aggregation.

Pavel_Duchovny · October 19, 2022, 1:01pm

Hi @NeNaD ,

2 options :

Using a cursor batch of 10 and getting next batch until exhausted.

2 . With the skip and limit When First round it is skip : 0 limit : 10 next one is skip: 10 limit : 10 and so on…

Ty
Pavel

NeNaD · October 19, 2022, 1:11pm

Hi @Pavel_Duchovny,

Thanks for the quick response!

As I tried to said, this will not work since it will stop after the first cycle.

In your solution, the $limit is used on each $unionWith, so when we got to the last stage for global $limit and $skip, we will always have only the data from the first cycle to iterate.

In my example above, the second iteration will be the final, since it will not continue to iterate over the data because of the inner $limit in each $unionWith stage.

Did I explained it properly?

Pavel_Duchovny · October 19, 2022, 2:23pm

Oh so the display size is actually a batch size?

If so then add a skip of x*display(batch)for every union with that you rerun

Why not to precalculate the priority of each product document so that you will just use sorr.

For the highest priority company all first x products will get priority 50 , the next will get priority 49 etc…

Consider this as a pagination score pattern