MongoDB randomly deleting objects in collection

Hey there guys,

I’ve been working with MongoDB for a uni project, long story short, we’re doing things with weather data and we’re using Mongo to store historical data for a select few locations.

I’ve noticed that when I check the data store, it’s randomly deleted objects from the db.


As you can see from the screenshot above, we might have one hour from 2016-01-01, and then it skips to 2016-01-03. This isn’t how the data was uploaded, in looked something more akin to this:
date: 01/01/2016 hour: 0
date: 01/01/2016 hour: 1
date: 01/01/2016 hour: 2
date: 01/01/2016 hour: 3
etc…

This has happened once before, but I reuploaded our data via Compass and thought it was just a one off. I’ve looked into TTL, however, I don’t think that is the cause of the issue since we never established any TTL when initially uploading.

If anyone has any ideas on what’s happening or how it can be stopped, please let me know! And let me know if you have any questions, I’ll try and answer them quickly :slightly_smiling_face:

Thanks!

It is very unlikely.

The data might not be in order. For example, the hour field being a string, the natural order will not be 0, 1, 2, 3, 4, … it will be most likely “0” , “1” , “11” , “12” , “13”, as strings are not sorted like numbers.

What is your data source? How do you ensure that you have all the data in the source?

If you keep dateString, despite being wasteful because you have date, I would suggest that you, at least, keep it in the ISO-8601 standard. See ISO - ISO 8601 — Date and time format for some reasons.

1 Like

I’ll quickly try and answer some things here :slight_smile:

The data might not be in order. For example, the hour field being a string, the natural order will not be 0, 1, 2, 3, 4, … it will be most likely “0” , “1” , “11” , “12” , “13”, as strings are not sorted like numbers.

So with this I believe it’s already ordered by the ISO date format, and when we initially uploaded the data, it did show it with the hour (albeit being a string) in order. It should be worth mentioning too that we’re using GraphQL for this, which does support Int and Float, so I’ll take a look and see if we can change to those. I’ve also just re-queried our DB which still shows certain hours as missing.

What is your data source? How do you ensure that you have all the data in the source?

Our data source is Meteostat, where we gather all data from a particular weather station, then keep anything before 01/01/2016. The method of checking everything is there sadly isn’t particularly advanced, but we’ve done random spot checks over a variety of days to make sure what we expect to be there is there.

If you keep dateString , despite being wasteful because you have date

And I’ll also take a look into this - this was implemented quickly to display on our frontend, however, I’m sure we can use a function to change this to a more “traditional” format. :slight_smile:

Welcome to the MongoDB Community @Luke_Coleman!

What sort order are you specifying for your query? It sounds like you are expecting the order of result documents to match insertion order, which is only guaranteed for the special case of a capped collection (see: What is the default sort order when none is specified?).

If you are sorting based on date components as string values, the lexicographic order will be be based on string comparisons (characters and length) as @steevej suggested.

However, it looks like you have a proper date field you could use for sorting (which should also obviate the need to duplicate the data information in various string formats).

To confirm this isn’t an issue with the results returned by your query, you could also try searching for the documents presumed missing from 2016-01-02.

Regards,
Stennie

Sounds like the documentation is misleading here?

The documents are returned in insertion order: (…)
No capped collection here.