[Performance] Query performance issue with 0 records as resultset

ilmagowalter · July 18, 2023, 11:35pm

Hi,

i have a problem with performance of query; i have a collection with 81 millions of records, where the fields that indicating a macro-category is “idTracciato”

  "idTracciato": "458"", "count": 23275575
  "idTracciato": "488"", "count": 1207470
  "idTracciato": "500"", "count": 1121987
  "idTracciato": "511"", "count": 956498
  "idTracciato": "456"", "count": 789206
  "idTracciato": "475"", "count": 520500
  "idTracciato": "304"", "count": 18014
  "idTracciato": "207"", "count": 15760
  "idTracciato": "107"", "count": 12613
  "idTracciato": "198"", "count": 9457
  "idTracciato": "411"", "count": 7166
  "idTracciato": "100"", "count": 6304
  "idTracciato": "410"", "count": 5474
  "idTracciato": "462"", "count": 3587
  "idTracciato": "132"", "count": 3156
  "idTracciato": "117"", "count": 3154
  "idTracciato": "102"", "count": 3152
  "idTracciato": "232"", "count": 3152
  "idTracciato": "177"", "count": 3152
  "idTracciato": "210"", "count": 3152
  "idTracciato": "461"", "count": 2594
  "idTracciato": "482"", "count": 2020

my queries have as last step “$limit” operator with 16 records, so if I have more than 16 records as a result of the query, mongo replies in 1.4 seconds; otherwise if i have no records mongo don’t reply ( after 10 minutes )

the query that have result is

db.flussi_dettagli.aggregate([
			{
				"$match" : {
					"idTracciato" : Long("458")
				}
			},
			{
				"$match" : {
					"dataCreazione" : {
						"$lte" : ISODate("2023-04-05T13:46:00.200+02:00")
					}
				}
			},
			{
				"$match" : {
					"stato" : {
						"$ne" : "CACHED"
					}
				}
			},
			{
				"$match" : {
					"metadata.annoRiferimento" : Long("2023"),
					"metadata.periodoRiferimento" : "01"
				}
			},
			{
				"$addFields" : {
					"validazioniReduce" : {
						"$reduce" : {
							"input" : "$validazioni",
							"initialValue" : {
								"dataAzione" : 0
							},
							"in" : {
								"$cond" : [
									{
										"$gte" : [ "$$this.dataAzione", "$$value.dataAzione" ]
									},
									"$$this",
									"$$value"
								]
							}
						}
					}
				}
			},
			{
				"$match" : {
					"flagVersioneMassima" : true
				}
			},
			{
				"$addFields" : {
					"keyNoStato" : "$key"
				}
			},
			{
				"$project" : {
					"keyNoStato.stato" : 0
				}
			},
			{
				"$project" : {
					"_id" : "$keyNoStato",
					"data" : 1,
					"metadata" : 1,
					"progressivo" : 1,
					"progressivoMassimo" : 1,
					"_oid" : "$_id",
					"stato" : 1,
					"validazioni" : 1,
					"validazioniReduce" : 1,
					"derived" : 1,
					"key" : 1,
					"tempDiscard" : 1,
					"idUtente" : 1,
					"idTracciato" : 1,
					"idFlusso" : 1,
					"component_1" : 1,
					"component_2" : 1,
					"component_3" : 1,
					"component_4" : 1
				}
			},
			{
				"$skip" : 0
			},
			{
				"$limit" : 16
			}
		]
)

if i add other stage of match like below

			{
				"$match" : {
					"metadata.annoRiferimento" : Long("2024"),
					"metadata.periodoRiferimento" : "01"
				}
			},

don’t reply…

please help me, i’m in trouble

ilmagowalter · July 20, 2023, 12:03pm

Someone has ideas for this problem ?

John_Sewell · July 20, 2023, 12:07pm

Have you run an .explain on the two scenarios?

What indexes are on your collection?

ilmagowalter · July 20, 2023, 6:33pm

the explain for scenario 1 ( with data aas result )

          "winningPlan": {
            "stage": "FETCH",
            "filter": {
              "flagVersioneMassima": {
                "$eq": true
              }
            },
            "inputStage": {
              "stage": "IXSCAN",
              "keyPattern": {
                "idTracciato": 1,
                "dataCreazione": 1,
                "stato": 1,
                "flagVersioneMassimaVERSIONED": 1
              },
              "indexName": "idx_max_version_VERSIONED_with_data",
              "isMultiKey": false,
              "multiKeyPaths": {
                "idTracciato": [],
                "dataCreazione": [],
                "stato": [],
                "flagVersioneMassimaVERSIONED": []
              },
              "isUnique": false,
              "isSparse": false,
              "isPartial": false,
              "indexVersion": 2,
              "direction": "forward",
              "indexBounds": {
                "idTracciato": [
                  "[458, 458]"
                ],
                "dataCreazione": [
                  "(true, new Date(1680695160200)]"
                ],
                "stato": [
                  "[MinKey, \"CACHED\")",
                  "(\"CACHED\", MaxKey]"
                ],
                "flagVersioneMassimaVERSIONED": [
                  "[MinKey, MaxKey]"
                ]
              }
            }
          },

with the second scenario i ran .explain() and after 600 seconds i did have a response

i have others index in collection but not with “metadata.periodoRiferimento” e “metadata.annoRiferimento”

but … this query is used to populate a data grid, and column can essere different between idTracciato x and idTracciato y

so i can add many filters

the problem happen when with fthe filters used i don’t have record to show

John_Sewell · July 20, 2023, 6:56pm

Can you put up the query that fails as well to check exactly what that looks like?

ilmagowalter · July 20, 2023, 7:39pm

the query is

db.flussi_dettagli.aggregate([
			{
				"$match" : {
					"idTracciato" : Long("458")
				}
			},
			{
				"$match" : {
					"dataCreazione" : {
						"$lte" : ISODate("2023-04-05T13:46:00.200+02:00")
					}
				}
			},
			{
				"$match" : {
					"stato" : {
						"$ne" : "CACHED"
					}
				}
			},
			{
				"$match" : {
					"metadata.annoRiferimento" : Long("2024"),
					"metadata.periodoRiferimento" : "01"
				}
			},
			{
				"$addFields" : {
					"validazioniReduce" : {
						"$reduce" : {
							"input" : "$validazioni",
							"initialValue" : {
								"dataAzione" : 0
							},
							"in" : {
								"$cond" : [
									{
										"$gte" : [ "$$this.dataAzione", "$$value.dataAzione" ]
									},
									"$$this",
									"$$value"
								]
							}
						}
					}
				}
			},
			{
				"$match" : {
					"flagVersioneMassima" : true
				}
			},
			{
				"$addFields" : {
					"keyNoStato" : "$key"
				}
			},
			{
				"$project" : {
					"keyNoStato.stato" : 0
				}
			},
			{
				"$project" : {
					"_id" : "$keyNoStato",
					"data" : 1,
					"metadata" : 1,
					"progressivo" : 1,
					"progressivoMassimo" : 1,
					"_oid" : "$_id",
					"stato" : 1,
					"validazioni" : 1,
					"validazioniReduce" : 1,
					"derived" : 1,
					"key" : 1,
					"tempDiscard" : 1,
					"idUtente" : 1,
					"idTracciato" : 1,
					"idFlusso" : 1,
					"component_1" : 1,
					"component_2" : 1,
					"component_3" : 1,
					"component_4" : 1
				}
			},
			{
				"$skip" : 0
			},
			{
				"$limit" : 16
			}
		]
).explain();

is stuck in running

now, i create a new index

db.flussi_dettagli.createIndex(
    {
        "idTracciato": 1,
        "dataCreazione": 1,
        "stato": 1,
        "flagVersioneMassimaVERSIONED": 1,
        "metadata.periodoRiferimento": 1,
        "metadata.annoRiferimento": 1
    },
    {
        name: "idx_max_version_VERSIONED_with_data_periodo"
    }
);

and now the query is fast!!!

the question is… every time i have to add a new match condition… i have to create an index ? if this, i will have many many indexes… is a problem ?

John_Sewell · July 21, 2023, 7:53am

Comparing the two queries I can only see a difference of changing the VALUE that’s being searched for not the field, assuming that it’s an extra field you’re searching for…

Yes…if you add a new property that you want to filter on you’ll need to add a new index to cover that field.

This may not be the best design approach to storing the data however, if you have a bunch of fields you want to be able to index for searching, i.e.

{
    seardchField1:'A',
    seardchField2:'B',
    seardchField3:'C',
}

Where you’ll need an index on each one, you could do this instead:

{
    searchData:[
        {
            fieldName:'seardchField1',
            fieldValue:'A'
        },
        {
            fieldName:'seardchField2',
            fieldValue:'B'
        },
        {
            fieldName:'seardchField3',
            fieldValue:'C'
        },
    ]
}

Add an index on both sub-elements

db.getCollection("Test").createIndex({'searchData.fieldName', 1, 'searchData.fieldValue':1})

and then this:

db.getCollection("Test").find({
    searchData: {
       $elemMatch: {
           fieldName: "seardchField3", 
           fieldValue: "A"
       }
    }
}
)

Will hit the index.

Take a look here:

Indexes are expensive in terms of memory so if you have lots of search requirements then it’s worth thinking about how to store the data to align with the search needs.
As an example we have a huge amount of documents in our system and create a common search area within documents to store data in an easily indexable format.

ilmagowalter · July 21, 2023, 8:43am

Yes, there is an error in my paste … first query have to be without match stage on metadata.periodoRiferimento and metadata.annoRiferimento

Regards pattern i will read the article but i aren’t surr that is appliable in my application, because apart some specific fields, the data structure can be different between documents

I understand that i must have indexes, but cause i can have filters on various combination of specific fields. Have i to create one index for all combinations of filters fields ?

John_Sewell · July 21, 2023, 8:51am

Indexes are sensitive to order of fields as well as direction (more important when sorting etc)

But if you have an index on A,B,C then you cannot use that index (very efficiently) to search on just C or B, but you could use it to search for A only, or A and B or A and B and C

I didn’t realise this until I just looked over the index documentation but Mongo will use index intersection:

So worth a read on that if you have multiple fields to index.

This may also be worth a read:

ilmagowalter · July 24, 2023, 6:23am

Thanks John for the clarification

system · July 29, 2023, 6:23am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.