14 x Aggregation stages to get latest published books and magazined by store?

Hi

This working Mongo Playground is my attempt to fetch the latest Books and Magazines (sub-collections) at each Store.

  • Latest Books: Get the latest books by publish_date and group them by genre
  • Latest Magazines: Get the latest magazines by publish_date

MongoPlayground

The above takes a whopping 14 x aggregation steps to achieve. The result is workable although not 100% in the final shape I’m after (see below), and I have concerns about it’s performance long term.

Objectives:

  1. My primary objective is to reduce the number of aggregations steps (hopefully less = more performant)
  2. My secondary objective is to get the result in the shape I’m after (see below)

Any assistance and guidance would be greatly appreciated.

My collection is in the following shape:

{
  store_id: "1001",
  books: [
    {
      genre: "fiction",
      publish_date: {
        $date: "2024-02-01T00:00:00.000Z",
      },
      store_id: "1001",
      title: "Book1",
      price: "100",
    },
    {
      genre: "fiction",
      publish_date: {
        $date: "2024-02-29T00:00:00.000Z",
      },
      store_id: "1001",
      title: "Book1",
      price: "120",
    },
    {
      publish_date: {
        $date: "2024-02-01T00:00:00.000Z",
      },
      genre: "fiction",
      store_id: "1001",
      title: "Book2",
      peice: "120",
    },
  ],
  magazines: [
    {
      store_id: "1001",
      title: "Title1",
      price: "50",
      publish_date: {
        $date: "2024-02-01T00:00:00.000Z",
      },
    },
    {
      price: "80",
      publish_date: {
        $date: "2024-02-29T00:00:00.000Z",
      },
      store_id: "1001",
      title: "Title1",
    },
    {
      price: "100",
      publish_date: {
        $date: "2024-02-01T00:00:00.000Z",
      },
      store_id: "1001",
      title: "Title2",
    },
  ],
}

The shape of the result I’d ideally like to achieve is:

{
  _id: "65e67208d31590f70d7b1061",
  store_id: "1001",
  latest_books: [
    [
      {
        genre: "novel",
        books: [
          {
            genre: "novel",
            publish_date: {
              $date: "2024-02-29T00:00:00.000Z",
            },
            store_id: "1001",
            price: "100",
            title: "Book4",
          },
        ],
      },
      {
        genre: "fiction",
        books: [
          {
            genre: "fiction",
            publish_date: {
              $date: "2024-02-29T00:00:00.000Z",
            },
            store_id: "1001",
            title: "Book2",
            price: "150",
          },
          {
            genre: "fiction",
            publish_date: {
              $date: "2024-02-29T00:00:00.000Z",
            },
            store_id: "1001",
            title: "Book1",
            price: "120",
          },
          {
            genre: "fiction",
            publish_date: {
              $date: "2024-02-01T00:00:00.000Z",
            },
            store_id: "1001",
            title: "Book3",
            price: "50",
          },
        ],
      },
      {
        genre: "mystery",
        books: [
          {
            genre: "mystery",
            title: "Book5",
            store_id: "1001",
            price: "100",
            publish_date: {
              $date: "2024-02-01T00:00:00.000Z",
            },
          },
        ],
      },
    ],
  ],
  latest_magazines: [
    {
      price: "100",
      publish_date: {
        $date: "2024-02-01T00:00:00.000Z",
      },
      store_id: "1001",
      title: "Title4",
    },
    // ...
  ],
}

Hello,

Please correct me if I have not understood your requirement correctly.
Your requirement seems - just sorting, and there is no aggregation as such to do.

Let me try this by applying a minor change in the schema. The “magazines” have been merged into “books” array. This has been classified as a special genre “magazine”. This minor modification in schema will make a straight path to the desired output. This modification will also help to make the query performant by providing a compound index on the same three fields in the sort method. It may also be useful to add one exclusive tag to make this classification more clear.

The key point is “sorting”. We need to go through this aggregation framework just for sorting. Since this kind of sorting - based on the embedded documents, is possible in aggregate method. The simple sort method on the cursor returned by find method does not do the same. mongodb - How to sort sub-documents in the array field? - Stack Overflow

Query: The books array has been unwound, then sorted on the desired keys. Please note the descending order by -1 for the latest publications first.

test> db.books.aggregate({$project:{"books.store_id":1, "books.genre":1, "books.publish_date":1}}, {$unwind:"$books"}, {$sort:{"books.store_id":1,"books.genre":1,"books.publish_date":-1}});

Please see if it is of any use.

Schema - modified

test> db.books.find();
[
  {
    _id: ObjectId('65e6cc866522000001383ed0'),
    books: [
      {
        genre: 'fiction',
        publish_date: { date: '2024-02-01T00:00:00.000Z' },
        store_id: '1001',
        title: 'Book1',
        price: '100'
      },
      {
        genre: 'fiction',
        publish_date: { date: '2024-02-29T00:00:00.000Z' },
        store_id: '1001',
        title: 'Book1',
        price: '120'
      },
      {
        publish_date: { date: '2024-02-01T00:00:00.000Z' },
        genre: 'fiction',
        store_id: '1001',
        title: 'Book2',
        peice: '120'
      },
      {
        publish_date: { date: '2024-02-01T00:00:00.000Z' },
        genre: 'magazine',
        store_id: '1001',
        title: 'Title1',
        price: '50'
      },
      {
        publish_date: { date: '2024-02-29T00:00:00.000Z' },
        genre: 'magazine',
        store_id: '1001',
        title: 'Title1',
        price: '80'
      },
      {
        publish_date: { date: '2024-02-01T00:00:00.000Z' },
        genre: 'magazine',
        store_id: '1001',
        title: 'Title2',
        price: '100'
      }
    ]
  }
]
test> 

Output:

[
  {
    _id: ObjectId('65e6cc866522000001383ed0'),
    books: {
      genre: 'fiction',
      publish_date: { date: '2024-02-29T00:00:00.000Z' },
      store_id: '1001'
    }
  },
  {
    _id: ObjectId('65e6cc866522000001383ed0'),
    books: {
      genre: 'fiction',
      publish_date: { date: '2024-02-01T00:00:00.000Z' },
      store_id: '1001'
    }
  },
  {
    _id: ObjectId('65e6cc866522000001383ed0'),
    books: {
      publish_date: { date: '2024-02-01T00:00:00.000Z' },
      genre: 'fiction',
      store_id: '1001'
    }
  },
  {
    _id: ObjectId('65e6cc866522000001383ed0'),
    books: {
      publish_date: { date: '2024-02-29T00:00:00.000Z' },
      genre: 'magazine',
      store_id: '1001'
    }
  },
  {
    _id: ObjectId('65e6cc866522000001383ed0'),
    books: {
      publish_date: { date: '2024-02-01T00:00:00.000Z' },
      genre: 'magazine',
      store_id: '1001'
    }
  },
  {
    _id: ObjectId('65e6cc866522000001383ed0'),
    books: {
      publish_date: { date: '2024-02-01T00:00:00.000Z' },
      genre: 'magazine',
      store_id: '1001'
    }
  }
]

Thanks
WeDoTheBestFor4

@wedothebest_We_do_the_Best thanks for the feedback.

The data schema I have described in my post above has been significantly slimmed down; the actual data schema is far more complex and where possible, I would prefer to solve this without a schema change.

Thanks for your efforts though.

Hello,

Thank you for your time to see the update, and clarifying your needs in detail.
Please see another update, as you have requested, have kept the schema intact.

**Query:** 
Step 1: Set 1 has been prepared by $unwind books array.
Step 2: Set 2 by $unwind magazines array.
Step 3: Set 1 and Set2 made into a union by $unionWith
Step 4: The unionised set sorted on "store Id, genre, publish date".

db.booksmagazines.aggregate({
    $project: {
        _id: 0,
        'books.store_id': 1,
        'books.genre': 1,
        'books.publish_date': 1,
        'books.title': 1,
        'books.price': 1
    }
}, {
    $unwind: '$books'
}, {
    $sort: {
        'books.store_id': 1,
        'books.genre': 1,
        'books.publish_date': -1
    }
}, {
    $unionWith: {
        coll: 'booksmagazines',
        pipeline: [{
            $project: {
                _id: 0,
                'magazines.store_id': 1,
                'magazines.genre': 1,
                'magazines.publish_date': 1,
                'magazines.title': 1,
                'magazines.price': 1
            }
        }, {
            $unwind: '$magazines'
        }, {
            $sort: {
                'magazines.store_id': 1,
                'magazines.genre': 1,
                'magazines.publish_date': -1
            }
        }]
    }
});

**Output:**
[
  {
    books: {
      genre: 'fiction',
      publish_date: { '$date': '2024-02-29T00:00:00.000Z' },
      store_id: '1001',
      title: 'Book1',
      price: '120'
    }
  },
  {
    books: {
      genre: 'fiction',
      publish_date: { '$date': '2024-02-01T00:00:00.000Z' },
      store_id: '1001',
      title: 'Book1',
      price: '100'
    }
  },
  {
    books: {
      publish_date: { '$date': '2024-02-01T00:00:00.000Z' },
      genre: 'fiction',
      store_id: '1001',
      title: 'Book2'
    }
  },
  {
    magazines: {
      price: '80',
      publish_date: { '$date': '2024-02-29T00:00:00.000Z' },
      store_id: '1001',
      title: 'Title1'
    }
  },
  {
    magazines: {
      store_id: '1001',
      title: 'Title1',
      price: '50',
      publish_date: { '$date': '2024-02-01T00:00:00.000Z' }
    }
  },
  {
    magazines: {
      price: '100',
      publish_date: { '$date': '2024-02-01T00:00:00.000Z' },
      store_id: '1001',
      title: 'Title2'
    }
  }
]

**Sample date:**
db.booksmagazines.find();
[
  {
    _id: ObjectId('65e8422e8b05ee991300eeb9'),
    store_id: '1001',
    books: [
      {
        genre: 'fiction',
        publish_date: { '$date': '2024-02-01T00:00:00.000Z' },
        store_id: '1001',
        title: 'Book1',
        price: '100'
      },
      {
        genre: 'fiction',
        publish_date: { '$date': '2024-02-29T00:00:00.000Z' },
        store_id: '1001',
        title: 'Book1',
        price: '120'
      },
      {
        publish_date: { '$date': '2024-02-01T00:00:00.000Z' },
        genre: 'fiction',
        store_id: '1001',
        title: 'Book2',
        peice: '120'
      }
    ],
    magazines: [
      {
        store_id: '1001',
        title: 'Title1',
        price: '50',
        publish_date: { '$date': '2024-02-01T00:00:00.000Z' }
      },
      {
        price: '80',
        publish_date: { '$date': '2024-02-29T00:00:00.000Z' },
        store_id: '1001',
        title: 'Title1'
      },
      {
        price: '100',
        publish_date: { '$date': '2024-02-01T00:00:00.000Z' },
        store_id: '1001',
        title: 'Title2'
      }
    ]
  }
]

Thanks
WeDoTheBestFor4

Hello,

Your original schema has been retained as such.
Kindly see another update - much more simplified with $sortArray.

Query: A two stage aggregation pipeline

  1. First stage - sorting the array of books
  2. Second stage : sorting the array of magazines
 db.booksmagazines.aggregate([{$set:{"books":{$sortArray:{input:"$books",sortBy:{store_id:1, publish_date:-1,genre:1,title:1}}}}},{$set:{"magazines":{$sortArray:{input:"$magazines", sortBy:{stored_id:1, publish_date:-1, title:1}}}}}])

Output:

[
  {
    _id: ObjectId('65e8422e8b05ee991300eeb9'),
    store_id: '1001',
    books: [
      {
        genre: 'fiction',
        publish_date: { '$date': '2024-02-29T00:00:00.000Z' },
        store_id: '1001',
        title: 'Book1',
        price: '120'
      },
      {
        genre: 'fiction',
        publish_date: { '$date': '2024-02-01T00:00:00.000Z' },
        store_id: '1001',
        title: 'Book1',
        price: '100'
      },
      {
        publish_date: { '$date': '2024-02-01T00:00:00.000Z' },
        genre: 'fiction',
        store_id: '1001',
        title: 'Book2',
        peice: '120'
      }
    ],
    magazines: [
      {
        price: '80',
        publish_date: { '$date': '2024-02-29T00:00:00.000Z' },
        store_id: '1001',
        title: 'Title1'
      },
      {
        store_id: '1001',
        title: 'Title1',
        price: '50',
        publish_date: { '$date': '2024-02-01T00:00:00.000Z' }
      },
      {
        price: '100',
        publish_date: { '$date': '2024-02-01T00:00:00.000Z' },
        store_id: '1001',
        title: 'Title2'
      }
    ]
  }
]

Thanks
WeDoTheBest4You