I’m currently developing an app that requires data aggregations based on dates.
We have a small framework and we once decided to store all dates as strings. (UTC in ISO format)
Now, this works fine but when it comes to aggregations on dates, the MongoDB query functions rely on BSON dates.
So my question is actually: what is the best approach?
Keep using storing the dates as strings in the documents and use the $dateFromString function
Store the dates as actual dates in the documents because the performance will be much better
Thank you already for all the expertise you can share with this new kid on the blob.
My preference is to store the native BSON date. I do not have numbers but since BSON are some kind of integer, they must use less space and comparisons must be faster. Less space means smaller working sets. This gives more chance to fit in RAM. So less I/O.
This being said it should be quite easy to write a small benchmark to confirm the above claims.
However if your string dates are ISO and Utc with year-month-day order you should not need to convert when comparing.
I agree with what @steevej said but i would add to it the question “how do you plan to use these dates”?
You mention querying and also aggregating and you mentioned being concerned about performance but what about correctness? Do you plan to do date arithmetic? You’ll end up having to convert your string dates to BSON dates anyway.
Another question to consider is what were the reasons that made you originally decide to store dates as strings? Those might be valid reasons to continue that way, or they may be reasons that didn’t really hold.
Well, to be honest, I can’t really remember anymore why we shifted towards dates as strings.
My best guess would be that we noticed issues in date handling between our backend services and our web applications…
Where in the past the datetimes were used just to record the time of the event, now we also have a requirement to aggregate on the parts of these dates. So basically my question can be rephrased as:
Assuming I have 10M documents that are created in the past 14 days, if I would like to see the number of records per day per hour, would the $dateFromString function give me a penalty over the native type? I agree with Steeve, it is more than likely that the native type will be much faster and it can be benchmarked.
So the real question is indeed your question: why would one use strings for dates?