20 characters id for natural id (index question)

Brian_Christensen1 · February 20, 2023, 7:12am

I will need to do a project where I will have to generate and track a lot of serialnumbers for physical items (millions).

I am restricted to use 20 characters for the serial number (a-Z0-9). I would like to use the serial numbers as a natural id in MongoDb.

As ObjectId is currently 24characters it will be too long to use directly. How would you suggest that I generate id’s that is efficient for mongodb to index? I’m guessing that if I just generate random strings (some subset of GUID), the index will be too scattered to be able to get indexed properly right?

How would you suggest that I generate efficient id’s?
My current solution is:
[itemType: 2byte][timestamp: 4bytes][random: 4byte]
(There might be 10-20 different itemTypes)

kevinadi · February 21, 2023, 12:59am

Hi @Brian_Christensen1 welcome to the community!

The _id field serves as the primary key for the collection, and if you don’t supply one, an ObjectId will be automatically generated for you.

If by “scattered” you mean that the documents will be stored inefficiently, then this is not the case. It doesn’t matter if the _id field is random or sequential. As long as they’re unique for the collection, that is.

Hope this helps.

Best regards
Kevin

Brian_Christensen1 · February 21, 2023, 12:13pm

Thank’s for your answer.
A couple of years ago I wanted to use UUID’s as primary keys in mysql, but I discovered that it had a hard time indexing the keys efficiently because the values was so “random”. I don’t know how the indexing works in details, but from what you write, it seems like it shouldn’t be a problem?

kevinadi · February 21, 2023, 11:30pm

Hi @Brian_Christensen1

I cannot comment on how MySQL does its data storage, but in MongoDB a random string primary key is not a problem. The default ObjectId is semi-random. In fact, in the documentation I linked to, UUID is one example of a common choice for _id.

In the long term, schema design and how the data is used is more important, in my opinion.

Best regards
Kevin

system · February 26, 2023, 11:30pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.