If I fetch data with pymongo and multiprocessing from a database with 1Mio entries 10x100k do I get all entries or can it be that not all are fetched.
I have run a test 500 times and checked if all items are included. Until now it has worked every time.
But is there a guarantee for this?
I fetch my data with skip and limit:
cursor = (col.find().skip(skip_value).limit(100000))
Where skip_value will always be 0,100k,200k… 900k .
Documents are like:
If you want to rely on the order of documents you must sort.
I did not see anywhere in the documentation that the order is deterministic.
Ok, thanks I have also found nothing in the documentation. Then I do it with sort().
If we look at the
cursor.sort() documentation, under the examples section we can find this blurb hidden in there:
The following query, which returns all documents from the
orders collection, does not specify a sort order:
The query returns the documents in indeterminate order:
We also see the following earlier in the document around sort constistency:
MongoDB does not store documents in a collection in a particular order. When sorting on a field which contains duplicate values, documents containing those values may be returned in any order.
If consistent sort order is desired, include at least one field in your sort that contains unique values. The easiest way to guarantee this is to include the
_id field in your sort query.
sort() is supplied then MongoDB will pull back the data in the most efficient manner possible, wether that’s pulling data as stored in memory or disk. After a lot of inserts/updates/deletes that data order could be changed. My guess is during your testing there wasn’t much in the way of activity going on so you might not have seen any difference in the order.
This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.