Hello, I need to insert a document by generating the unique server-side “_id” id. I have two questions:
(1) Assuming that there are potentially 100 processes that can insert a new document at the same time, can the probability of collision between the various processes be negligible?
(2) Assuming that the collection already contains 1 billion documents, the probability that a new document to be inserted has an id that is already present in the collection is negligible?
_id
are generated client-side by the drivers when they send the documents to the cluster if one doesn’t already exists. It’s buried in the insertOne and insertMany source code somewhere - depends of which driver you are using of course.
ObjectIds are built in a very specific manner:
So first of all, the 100 processes all have a different random value in the middle. So they can’t compete with another process inserting documents because there are 5 bytes * 8 bits = 2^40 possibilities (I think?).
The timestamp and the counter are here to make sure that documents inserted during the same second have a different ObjectId within the same process.
So it’s impossible to generate the same ObjectId with a single process and very unlikely that 2 processes writing to the same cluster share the same random value in the middle. Else the incrementing counter starting at a random value has your back anyway.
So I guess the probability is very very VERYYYY close to zero.
Cheers,
Maxime.
Very clear explanation. Thank you!
This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.