Best way to determine which indexes are needed for aggregation?

We have a fairly complicated aggregation query used for gluing some data together and providing it in a server-side paging application. What is the best way to determine what the indexes should be for this kind of complex query? I thought that Mongodb would “recommend indexes” like our SQL Server does. I have seen that in the past, but we have not received any guidance for this collection. It’s fairly new - is there some way to accelerate the recommendations?

Query below as an example, but I don’t expect you to be able to understand it without some explanation of our crazy data :roll_eyes:

[
  {
    $graphLookup:
      {
        from: "FlatSite",
        startWith: "$ParentRowId",
        connectFromField: "ParentRowId",
        connectToField: "RowId",
        maxDepth: 10,
        as: "Parents",
        depthField: "level",
      },
  },
  {
    $match:
      {
        LevelId: 2,
      },
  },
  {
    $project:
      {
        _id: 1,
        Columns: 1,
        Value: 1,
        RowId: 1,
        ParentColumns: {
          $reduce: {
            input: "$Parents.Columns",
            initialValue: [],
            in: {
              $concatArrays: [
                "$$value",
                "$$this",
              ],
            },
          },
        },
      },
  },
  {
    $project:
      {
        _id: 1,
        Value: 1,
        RowId: 1,
        Columns: {
          $concatArrays: [
            "$Columns",
            "$ParentColumns",
          ],
        },
      },
  },
  {
    $project:
      {
        _id: 1,
        Value: 1,
        RowId: 1,
        Columns: {
          $filter: {
            input: "$Columns",
            as: "column",
            cond: {
              $in: [
                "$$column.ColumnId",
                [
                  ObjectId(
                    "60707b306d3a5d6157bfe469"
                  ),
                  ObjectId(
                    "60707b336d3a5d6157bfe47c"
                  ),
                  ObjectId(
                    "64ad9acfff44fd298888d34e"
                  ),
                ],
              ],
            },
          },
        },
      },
  },
  {
    $match:
      {
        Columns: {
          $all: [
            {
              $elemMatch: {
                ColumnId: ObjectId(
                  "60707b306d3a5d6157bfe469"
                ),
                $and: [
                  {
                    Value: /^E-211$/i,
                  },
                ],
              },
            },

          ],
        },
      },
  },
  {
    $project:
      {
        _id: 1,
        Columns: 1,
        Value: 1,
        RowId: 1,
        SortColumn1: {
          $arrayElemAt: [
            {
              $filter: {
                input: "$Columns",
                as: "c",
                cond: {
                  $eq: [
                    "$$c.ColumnId",
                    ObjectId(
                      "60707b306d3a5d6157bfe469"
                    ),
                  ],
                },
              },
            },
            0,
          ],
        },
        SortColumn2: {
          $arrayElemAt: [
            {
              $filter: {
                input: "$Columns",
                as: "c",
                cond: {
                  $eq: [
                    "$$c.ColumnId",
                    ObjectId(
                      "64ad9acfff44fd298888d34e"
                    ),
                  ],
                },
              },
            },
            0,
          ],
        },
      },
  },
  {
    $setWindowFields:
      {
        output: {
          TotalRecords: {
            $count: {},
          },
        },
      },
  },
 
  {
    $sort:
      /**
       * Provide any number of field/order pairs.
       */
      {
        "SortColumn1.Value": -1,
        "SortColumn2.Value": -1,
      },
  },
  {
    $skip:
      /**
       * Provide the number of documents to skip.
       */
      0,
  },
  {
    $limit:
      /**
       * Provide the number of documents to limit.
       */
      100,
  },
  {
    $project:
      /**
       * specifications: The fields to
       *   include or exclude.
       */
      {
        _id: 0,
        SortColumn1: 0,
        SortColumn2: 0,
      },
  },
]

Hey @Deanna_Delapasse1,

The “Best” is a relative term here since it relies on various factors but I have some thoughts on general approaches that might help you in optimizing your pipeline:

  1. Consider indexing fields in the early pipeline stages, as it can really boost performance by filtering/sorting data before further processing. For your case, you can optimize index usage by using $match early on and avoiding $skip and $limit. However, these are just general approaches, and may not necessarily apply to your specific use case. For more information on this topic and how the server optimizes certain situations, refer to Aggregation Pipeline Optimization.

    One thing - I noticed that you are using $project multiple times. Try to refactor your pipeline and see if you can make it more efficient by reducing the number of $project stages.

  2. Use the explain('executionStats') method: You can run your aggregation with the explain('executionStats') method to identify which stages are taking the most time.

    If you prefer to use a GUI, you can also use MongoDB Compass query plan view to see the explain() output.

  3. You can also consider using compound indexes for multiple fields if they cover multiple filtered/sorted fields from the pipeline stages. Combining fields in one index can further optimize performance, especially when querying on multiple criteria.

    Please note that generally, MongoDB only uses one index to fulfill most queries. To read more, please refer to the Indexes Strategies - documentation.


May I ask what goals you are trying to accomplish with this aggregation pipeline? Is this something you do regularly, or if you’re open to schema changes that could simplify queries?

In case you need further help, please feel free to share the sample documents, and sample output you are expecting. This will help us assist you better.

Let us know if any of this helps or if you need further assistance!

Regards,
Kushagra