Why IXSCAN not enough?

My Report data structure is:

{
  Financials: [
  {
    "Sales": 23423423.0,
    "Last": false
  },
  {
    "Sales": 56756756.0,
    "Last": true
  }
 ]
}

Created any index with:

db.Report.createIndex(
    {
        "Financials.Sales": 1,
        "Financials.Last": 1
    }
);

Running query:

db.Report.count({ 
    "Financials" : { 
        "$elemMatch" : { 
            "Sales" : { 
                "$gte" : 100000000.0, 
                "$lte" : 1000000000.0 
            }, 
            "Last" : true 
        } 
    }
})

has two stages, an IXSCAN (3%) and a FETCH(96%) stage.

Why IXSCAN stage is not enough? How can I optimize query / index?
Thanks!

Hi @Anteaus ,
According to the multikey indexes limitations documentation, Multikey indexes cannot cover queries over array field(s).
You’ll probably need to change your data structure to cover the query using an index.
Goodluck,
Rafael,

1 Like

What change do you advice?
I dont see any chance to not using embeded array to store data like this?
Thanks!

you can move the financial data to a separate collection with documents like this:

{
  reportId: ...,
  Sales: ...,
  Last: ...
}

But if I need query Report like { Year: 2020, Financials:{ $elemMatch: {Sales: …, Last: …} } } it will need a join, which is also slow?

@Anteaus,
I don’t know all of your queries requirements but you may consider duplicating the data (keep it both in Report’s collection and in Financial’s collection).
In general, the data structure should be based on your queries.
For more information I recommend you to look at this course: M320: Data Modeling | MongoDB University
or follow this guide: https://university.mongodb.com/exam/guide#data-modeling-developer-only

I dont know how duplication will help me when need to query both Report and embeded Financials with covered index.

An other issue is $gte and $lte on one field. Query with just $gte is using index.

@Anteaus,
Covering multikey indexes is logically not possible, because the key entry in the index table covers only part of the array field.
In addition, it’s also not possible for count ops, because multiple keys may be associated with the same document (it is possible to count without fetching with additional processing).

I don’t know your queries requirements, but you may consider using the computed pattern.
For example add these fields to the documents:

  • numberOfSalesGt1M
  • numberOfSalesGt10M
  • numberOfSalesGt100M

and update them during update ops.

Hope that helps.
Goodluck,
Rafael,

Hi @Anteaus first of all, welcome to the community!

Actually a FETCH stage is not all bad. It just simply means that the server need to either: 1) Fetch the actual result documents to give to you, or 2) Fetch the document to further check the query parameters match to actual documents.

Not all queries need to be a covered query. Anecdotally, if all your queries are covered queries, then you need many, many indexes to cover all the queries you have. You definitely can have too many indexes in a collection (you can have at most 64 indexes in a collection) and with each additional index, writing to the collection involves updating all those indexes as well. This is on top fo what @Rafael_Green mentioned as well: a multikey index (index involving arrays) cannot be used for a covered query, due to how a multikey index works.

Thus, if your queries are adhering to the principles outlined in the Query Optimization page and are fast enough for your needs, it doesn’t really matter that the query has FETCH stage.

However if you’re very concerned about the query’s performance, please post

  • Your MongoDB version
  • The output of db.collection.explain('executionStats').find(....) of the query in question

Best regards
Kevin

3 Likes

Its totally OK if you need all of records (you cant skip fetch step) or do pagging (fetch with low limit is fast), but if you need only count why fetch all of records intstead just using index?

Use case: run ad hoc queries against Report collection, pagginate result and show total count of matching records.

Version: “4.2.6”

Query:

db.Report.explain().count({ 
    "Financials" : { 
        "$elemMatch" : { 
            "Sales" : { 
                "$gte" : 100000000.0, 
                "$lte" : 1000000000.0 
            }, 
            "Last" : true 
        } 
    }, 
    "Type" : { "$in" : [1] } 
})

Plan:

{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "data.Report",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"Financials" : {
						"$elemMatch" : {
							"$and" : [
								{
									"Last" : {
										"$eq" : true
									}
								},
								{
									"Sales" : {
										"$lte" : 1000000000
									}
								},
								{
									"Sales" : {
										"$gte" : 100000000
									}
								}
							]
						}
					}
				},
				{
					"Type" : {
						"$eq" : 1
					}
				}
			]
		},
		"queryHash" : "B29AE203",
		"planCacheKey" : "FFE5A870",
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"$and" : [
					{
						"Financials" : {
							"$elemMatch" : {
								"$and" : [
									{
										"Sales" : {
											"$lte" : 1000000000
										}
									},
									{
										"Sales" : {
											"$gte" : 100000000
										}
									},
									{
										"Last" : {
											"$eq" : true
										}
									}
								]
							}
						}
					},
					{
						"Type" : {
							"$eq" : 1
						}
					}
				]
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"Financials.Sales" : 1
				},
				"indexName" : "Financials.Sales_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"Financials.Sales" : [ "Financials" ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"Financials.Sales" : [ "[100000000, 1000000000]" ]
				}
			}
		},
		"rejectedPlans" : [
			{
				"stage" : "FETCH",
				"filter" : {
					"$and" : [
						{
							"Financials" : {
								"$elemMatch" : {
									"$and" : [
										{
											"Sales" : {
												"$lte" : 1000000000
											}
										},
										{
											"Sales" : {
												"$gte" : 100000000
											}
										},
										{
											"Last" : {
												"$eq" : true
											}
										}
									]
								}
							}
						},
						{
							"Type" : {
								"$eq" : 1
							}
						}
					]
				},
				"inputStage" : {
					"stage" : "IXSCAN",
					"keyPattern" : {
						"Financials.Sales" : 1,
						"Financials.Last" : 1
					},
					"indexName" : "Financials.Sales_1_Financials.Last_1",
					"isMultiKey" : true,
					"multiKeyPaths" : {
						"Financials.Sales" : [ "Financials" ],
						"Financials.Last" : [ "Financials" ]
					},
					"isUnique" : false,
					"isSparse" : false,
					"isPartial" : false,
					"indexVersion" : 2,
					"direction" : "forward",
					"indexBounds" : {
						"Financials.Sales" : [ "[100000000, 1000000000]" ],
						"Financials.Last" : [ "[true, true]" ]
					}
				}
			},
			{
				"stage" : "FETCH",
				"filter" : {
					"Financials" : {
						"$elemMatch" : {
							"$and" : [
								{
									"Last" : {
										"$eq" : true
									}
								},
								{
									"Sales" : {
										"$lte" : 1000000000
									}
								},
								{
									"Sales" : {
										"$gte" : 100000000
									}
								}
							]
						}
					}
				},
				"inputStage" : {
					"stage" : "IXSCAN",
					"keyPattern" : {
						"Type" : 1
					},
					"indexName" : "Type_1",
					"isMultiKey" : false,
					"multiKeyPaths" : {
						"Type" : [ ]
					},
					"isUnique" : false,
					"isSparse" : false,
					"isPartial" : false,
					"indexVersion" : 2,
					"direction" : "forward",
					"indexBounds" : {
						"Type" : [ "[1, 1]" ]
					}
				}
			},
			{
				"stage" : "FETCH",
				"filter" : {
					"$and" : [
						{
							"Financials" : {
								"$elemMatch" : {
									"$and" : [
										{
											"Last" : {
												"$eq" : true
											}
										},
										{
											"Sales" : {
												"$lte" : 1000000000
											}
										},
										{
											"Sales" : {
												"$gte" : 100000000
											}
										}
									]
								}
							}
						},
						{
							"Type" : {
								"$eq" : 1
							}
						}
					]
				},
				"inputStage" : {
					"stage" : "IXSCAN",
					"keyPattern" : {
						"Financials.Last" : 1
					},
					"indexName" : "Financials.Last_1",
					"isMultiKey" : true,
					"multiKeyPaths" : {
						"Financials.Last" : [ "Financials" ]
					},
					"isUnique" : false,
					"isSparse" : false,
					"isPartial" : false,
					"indexVersion" : 2,
					"direction" : "forward",
					"indexBounds" : {
						"Financials.Last" : [ "[true, true]" ]
					}
				}
			},
			{
				"stage" : "FETCH",
				"filter" : {
					"$and" : [
						{
							"Type" : {
								"$eq" : 1
							}
						},
						{
							"Financials" : {
								"$elemMatch" : {
									"$and" : [
										{
											"Last" : {
												"$eq" : true
											}
										},
										{
											"Sales" : {
												"$lte" : 1000000000
											}
										},
										{
											"Sales" : {
												"$gte" : 100000000
											}
										}
									]
								}
							}
						}
					]
				},
				"inputStage" : {
					"stage" : "AND_SORTED",
					"inputStages" : [
						{
							"stage" : "IXSCAN",
							"keyPattern" : {
								"Type" : 1
							},
							"indexName" : "Type_1",
							"isMultiKey" : false,
							"multiKeyPaths" : {
								"Type" : [ ]
							},
							"isUnique" : false,
							"isSparse" : false,
							"isPartial" : false,
							"indexVersion" : 2,
							"direction" : "forward",
							"indexBounds" : {
								"Type" : [ "[1, 1]" ]
							}
						},
						{
							"stage" : "IXSCAN",
							"keyPattern" : {
								"Financials.Last" : 1
							},
							"indexName" : "Financials.Last_1",
							"isMultiKey" : true,
							"multiKeyPaths" : {
								"Financials.Last" : [ "Financials" ]
							},
							"isUnique" : false,
							"isSparse" : false,
							"isPartial" : false,
							"indexVersion" : 2,
							"direction" : "forward",
							"indexBounds" : {
								"Financials.Last" : [ "[true, true]" ]
							}
						}
					]
				}
			}
		]
	},
	"serverInfo" : {
		"host" : "mongodb-1",
		"port" : 27017,
		"version" : "4.2.6",
		"gitVersion" : "20364840b8f1af16917e4c23c1b5f5efd8b352f8"
	},
	"ok" : 1
}

Hi @Anteaus

I think you’re hitting on a peculiarity of a multikey index, especially when paired with an $elemMatch :slight_smile:

Let me give an example. Suppose we’re searching for a specific element in an array using $elemMatch:

> db.test.find( {arr: { $elemMatch: {field:123} } })
[
  { _id: 2, arr: [ { field: 123 } ] }
]

Quite straightforward. The query returns one document.

However if we remove the $elemMatch:

> db.test.find( {arr:{field:123}} )
[
  { _id: 1, arr: { field: 123 } },
  { _id: 2, arr: [ { field: 123 } ] }
]

There are actually two documents matching that criteria. One with arr as an array, and one as a subdocument.

Now if we direct our attention to a very subtle sentence in the documentation for $elemMatch:

The $elemMatch operator matches documents that contain an array field with at least one element that matches all the specified query criteria.

The keyword here is “documents that contain an array field”. In other words, a query using $elemMatch must return a set of documents where the field is an array. This is why in the first example it returns only one document, while on the second example it returns all matching documents, array or otherwise.

Therefore the FETCH stage is needed in this case, because MongoDB can’t tell for sure that a field is an array for 100% of the documents in a collection, or if there are some documents that could match the query criteria, but is not an array. $elemMatch mandates that it must be an array, so it needs to check the actual document. This fact is not visible from the index alone.

Best regards
Kevin

3 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.