Query specific date and time for every year

Hi All,

Been looking for a solution for a while now but no luck, i have a database with 400k documents and i’m looking to make a query to retrieve result for a specific day of every year.
So for example every document with date 27 oct and it should return a document for 2022, 2021, 2020…

Any idea on how i should approach this?

here’s a sample of one of my documents

{
  "timestamp": {
    "$date": {
      "$numberLong": "284065200000"
    }
  },
  "temp_min": 5.84,
  "sunset": 0,
  "deg": 39,
  "description": "overcast clouds",
  "temp_max": 6.58,
  "icon": "04n",
  "temp": 6.21,
  "_id": {
    "$oid": "630f86dda4d0f75b111babee"
  },
  "visibility": 0,
  "gust": 0,
  "sea_level": 0,

  "humidity": 50,
  "weather_id": 804,



  "feels_like": 2.94,
  "sunrise": 0,
  "main": "Clouds",
 
  "grnd_level": 0,
  "timezone": 3600,
  "id": 0,
  "type": 0,
  "speed": 4.81,

  "pressure": 995
}

Hi @marco_ferrari ,

You can use $dateToParts to extract the day and month as “27oct” string and search for it overall documents:

Let me know if you run into difficulty

Thanks
Pavel

Let me know if I am wrong.

I am worry that a $match with $dateToParts to get day and month from the stored document’s date field would not be able to leverage an index on the date field.

It is not really an issue if this is a infrequent use-case. But what if it is.

One solution could be to use $in with a list of calculated exact dates. The caveat is that you need to limit the years so that the array of date is limited. Using Oct-27, the query, assuming data starts in 2017, could be

{ "$in" : [ 2017-10-27, 2018-10-27, 2019-10-27, 2020-10-27, 2021-10-27, 2022-10-27 ] }

I think the chances for an IXSCAN are higher. But I could be wrong.

An alternative I see is to update the collection with the $dateToParts result and have an index on month:1,day:1,year:1. I include year:1 in the index so we can sort by year using the index.

1 Like

thanks everyone for your response. i’m still trying your suggestions and not yet sure what is the best solution tbh.

i’ll try and share my final solution here for future references

Sorry if I confused you.

The best solution at first is the simplest and it is the one provided by @Pavel_Duchovny.

Any optimization, like the ones I mentioned, should not be implemented unless you have performance problems. Do not make your code or data more complex before you need to.

1 Like

Thanks for your clarification.