Most effective approach for modelling ? Number vs String

Hi, I have an users collection and trying to create a proper model in mongodb / mongoose.
I just wonder how should I store selectable datas so that It would be more performant when indexing & querying.
For example, every user has one of the following study situation →

highschool, undergrad, grad, doctorate, working

I can define this field as String or Number like 1 for highschool, 2 for undergrad , 3 for grad, 4 for doctorate, 5 for working.

So we would end up with two situation →

User:{
“studycode”:“undergrad” or “studycode”:“2”

}

which would be more performant when I create an index or compound index on this field ?

1 Like

I encounter similar dilemmas even with bigger fields like hobbies. I am keeping hobbies in a seperate collection and use their _id value in user documents, _id is a good chose for indexing ? wouldn’t it be faster if I assign numbers to hobbies like 0 1 2 3 4 5. . . … and put them instead of _id ? I want to be fast in terms of indexed queries on those fields.
Thanks

Number comparisons should be faster than string comparisons. Note that the number 2 inside quotes is a value of type string. This is not a value of type number. I wrote should because an indexed string will be faster than a non indexed number. In this case, your key sets is small and differs at the first letter so you should not see a lot of difference.

Numbers take less space than string. So more data fits in cache. This can have a favorable difference for numbers.

Descriptive strings are more readables.

It is a tradeoff.

You could use a StudyCode collection to map your number code to a readable string. I have done it and it help me with localisation of the UI because StudyCode will have language specific values like

{ _id:2 , en:"high school",fr:"étude secondaire"}

In a case like that you could generate your own _id as small number rather than having an ObjectId. I have done it.

Thanks for your answer, I will also have Hobbies collection where I keep hobbies , there are more than 100 hobbies and each user carry a few of them in an array . So, It is much better to give them _id values as Number when I need to query right? Because users will have them in an array like hobbies:[ hobbie1, hobbie2] so it is better to have them as number rather than _id Object ?

When the mapping collection is big and/or dynamic I find it easier to let the system generate the _id as an ObjectId.

1 Like