$lookup - Does Data Type of Join fields matter from Performance stand point?

When you use $lookup to join two collections does the data type of the join fields make a difference when it comes to performance?

I could use Object ID, a Integer Value or a String value as my join fields. Each take their own number of bytes for storage. But when it comes to join, would it make a difference.

Use Case

Transactional Collection has ID and Code
Master Data Collection ID, Code and Description

To avoid any reference lookup’s, I might decide to only save Code in Transactional collection and avoid ID if that is not necessary and will not help me performance wise.

Any feedback?

Hi @Vikram_Bade ,

I can think of an index size and effecincy that can be impacted by the data types in case join uses the indexes.

Having said that, we recommend data that is accessed together to be stored together therefore if you don’t have a justification to keep this data in seperate collections store the logical relationship in a single document and avoid the join.

This will boost speed troumendsly :wink:

Thanks
Pavel

Thanks @Pavel_Duchovny

I am fully sold on storing the data together and avoid joins to the maximum extent. I wanted to know if I really have a justified reason of storing data in a collection. One use case may be master data, you can have base information of master data which is most often used as part of main collection. There “may be” some extra information regards to master data which is stored in another collection on very case by case use.

So, you mean to say, data types do impact efficiency of the lookup and try to join using either ID or numerical data types over string data types?

Thanks,
Vikram

Hi @Vikram_Bade

I cannot quantify the impact so it needs testing actually.

However, whats the reason not to have an extra field on some documents with the extra information. MongoDB documents can use polymorphism so that can be only an advantage.

Building with Patterns: The Polymorphic Pattern | MongoDB.

Thanks
Pavel

@Pavel_Duchovny

I am not having a specific use case, the reason of this question is to get an idea of options we have to help on design decisions if it really comes to a situation where we need to use lookup and if yes, what should be ideal data types for look up.

Thanks

My opinion.

One important thing is that if you do $lookup ensure that the types are the same in the referring collection and the referred collection. For example, if you have _id:ObjectId(hexstring) in the referred collection make sure you have referred:ObjectId(hexstring) rather than referred:hexstring in the referring collection. Otherwise a $lookup will require the conversion of hexstring to ObjectId(hexstring) for every references and may require a $lookup variant that is less efficient.

An index with string values takes more space, so with numbers your working set that fits in RAM is bigger thus faster. Comparing numbers is usually faster than comparing strings. I suspect that an ObjectId is comparable to numbers.