Realm filter Query optimization with several CONTAINS parameters and a Realm Reference filter

Hi all!

Fairly new to Realm (first post on these forums) using the JS SDK for building an Electron app using Realm local and eventually will be setting up Realm Sync to store and sync data with MongoDB Atlas.

I come from a SQL Server background. Trying to de-program my relational DB normalization mindset, haha. Slowly but surely loving Realm / Mongo document based DB concept more and more everyday!

My question is on query optimization. Lets say I have the following schema:

let Datastore = {
    name: 'Datastore',
    primaryKey: '_id',
    properties: {
        _id: 'objectId',
        name: 'string',
        description: 'string'
    }
}

let Block = {
    name: 'Block',
    primaryKey: '_id',
    properties: {
        _id: 'objectId',
        sourceUid: 'string',
        content: 'string',
        datastore: 'Datastore?'
    }
}

Lets say that Datastore only has about 10 objects but Block has 5 million. And then lets say for example I want to do several string CONTAINS and LIKE matches on the Block.content but also then filter by a specific datastore.name.

let foundBlocks = realm.Object('Block').filtered('datastore.name == "Source 12345" AND content CONTAINS "cool block" AND content CONTAINS "also this" AND content LIKE "*wildcard*this*that*" AND content BEGINSWITH "starting"');

So I have a couple questions:

  1. Does Realm query engine optimize for the most “efficient” way automatically to filter down the criteria before applying more resource intensive string operators such as LIKE and CONTAINS?
  2. Based on 1, would it instead be more efficient to first query/filter by the Datastore.name since it only has 10 objects and then through that filters down the Block objects more efficiently through the Realm Reference in the Datastore? property? Basically does it make more sense to filter down the 5 million Block objects based on its datastore property to something like 500k objects BEFORE then separately applying the string content CONTAINS and LIKE query operators to give me my final filtered results? OR does Realm automatically optimize for this and therefore I can write it all in one single filter/query like I have above?
  3. Like in other query languages like SQL, typically LIKE operators are very expensive. Is this true in Realm as well? What about CONTAINS? Do you have any tips on the performance between these operators and whether it will make a big performance difference if I can avoid using LIKE and instead use CONTAINS… and similarly if at all possible, using “==” exact matches I assume is much more performant than CONTAINS? What about BEGINSWITH and ENDSWITH?

Any other tips would be much appreciated! Thanks a lot and I look forward to my (our) journey into Realm and eventually MongoDB Atlas!

Best,
Shawn

To be more clear, my question 2 above would mean do something like this:

//first filter down to just Blocks matching the Datastore reference which reduces collection from 5mil to 100k results
let filterBlocks = realm.Object('Block').filtered('datastore.name == "Source 12345"');
//now apply the string content operators to the 100k remaining results
let foundBlocks = filterBlocks.filtered('content CONTAINS "cool block" AND content CONTAINS "also this" AND content LIKE "*wildcard*this*that*" AND content BEGINSWITH "starting"');

So the question becomes does breaking this into two queries perform more efficiently because the string content operators are only applied against 100k Block objects and not the initial full collection of 5mil objects?

Hi @Shawn_Murphy welcome to the community!
You have a good intuition about performance cost of the query.

  1. Yes, the Realm query engine has various heuristics that optimize the performance. There is no guarantee that the query will be run in the order that you write it. In fact, it may jump between the conditions based on what data is loaded from disk in order to minimize cache misses.
  2. I recommend running your entire query at once, this gives the query engine the highest chance of making optimizations. However every dataset is different and when advising about query performance, I like to suggest that you experiment with your queries and see what works for your particular app yourself. If you find a query that doesn’t seem to scale well with your dataset, we may be able to make some further optimizations if you are willing to work with us debugging it.
  3. You are correct about the relative cost of string operators: LIKE > CONTAINS > BEGINSWITH/ENDSWITH > EQUAL.

I also recommend that you try turning on an index for any properties that you are doing exact string matches on (Datastore.name). This should dramatically help performance of the query, at the cost of some overhead to maintain the index on insertion/deletion/modify.

1 Like

@James_Stone thanks for the response! One follow up question on your suggestion on Indexing. I posted another question at link below, but if my property is a string with lots of text like a paragraph and I try to do “contains” queries on it, will indexing help? Or only when doing == exact matches of the entire string property?

See here: How does Realm indexes work with string properties and partial match search