I’m a new guy in the NoSQL world and I need to migrate an existing project on PostgreSQL to Mongodb.
I’m struggling with one specific query which is really not complicated on SQL side but I cannot manage to do the equivalent in noSql.
Could you help me please ?
The SQL query that I need to “adapt” to NoSql world
select * from transactions t where substring(t.voucherNumber,1,16) in (:voucherList)
As you can see, I store voucherNumbers in 30 characters in database but for a specific workflow, I received a list of vouchers on 16 characters and I need to find all the documents where the voucherNumber starts with one of the list.
It tried regex, clause IN etc, but I cannot manage to do it.
True, in this case if it’s just the prefix we’re searching for a straight regex does make it easier, although you’ll need to form the regex string in code (which is trivial).
Be interesting to see the performance difference between the two though.
I just created a quick collection with about 100,000 random voucher codes and ran the two approaches 10 times or so each.
Both come down to sub 200ms execution time (running on local workstation with lots of CPU and RAM, plus only one field in documents so the index fully covers the query in RAM)
So either way should be good, if you want to compare two fields against each other you’ll want to use the $expr but in this case as Vishal said you can just form a regex.
Note that unless you are using Atlas search indexes and doing something interesting, you need to anchor the start of a regex in order for Mongo to make use of an index when matching against it.
I’m not sure to really understand what I should do concerning “Note that unless you are using Atlas search indexes and doing something interesting, you need to anchor the start of a regex in order for Mongo to make use of an index when matching against it.” ?
I don’t really have a preference between the two solutions.
The first solution of John is working well, and the one of Vishal works too (indeed the regex is trivial to build) and apparently there are no difference on performance side.
How did you measure the performance? I mean if you are checking just taken time is less then it does not mean it performed well, understand the explain() method to check the exact view of the analytics of your query.
For more details. refer to the Performance best practices,
If you want to use the index in voucherNumber then I have already mentioned that in my previous post $expr with $in does not use an index it will do a collection scan, which means it will impact First Disk Speed, especially on a large collection, Second The size of your WiredTiger cache and the size of your overall RAM.
On the other side, you can refer to the documentation about indexing strategies
Vishal is correct, I didn’t notice that from prior use, running on a test collection, while fast on a large dataset it IS performing a colscan using $in embedded within the Expr. I’ve previously used this with other operators which WILL hit an index and so not noticed this restriction.
So I agree, in this scenario don’t embed the $in within the expr and just use a regex, and something to watch out for, until they add this functionality!