Collision Probability of generating _id server-side

Hello, I need to insert a document by generating the unique server-side “_id” id. I have two questions:
(1) Assuming that there are potentially 100 processes that can insert a new document at the same time, can the probability of collision between the various processes be negligible?
(2) Assuming that the collection already contains 1 billion documents, the probability that a new document to be inserted has an id that is already present in the collection is negligible?

Hi @Matteo_Tarantino,

_id are generated client-side by the drivers when they send the documents to the cluster if one doesn’t already exists. It’s buried in the insertOne and insertMany source code somewhere - depends of which driver you are using of course.

ObjectIds are built in a very specific manner:

So first of all, the 100 processes all have a different random value in the middle. So they can’t compete with another process inserting documents because there are 5 bytes * 8 bits = 2^40 possibilities (I think?).
The timestamp and the counter are here to make sure that documents inserted during the same second have a different ObjectId within the same process.

So it’s impossible to generate the same ObjectId with a single process and very unlikely that 2 processes writing to the same cluster share the same random value in the middle. Else the incrementing counter starting at a random value has your back anyway.

So I guess the probability is very very VERYYYY close to zero.

Cheers,
Maxime.

1 Like

Very clear explanation. Thank you!

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.