Realm Swift Indexes vs. Atlas Collection Indexes

When working with Realm Swift, I can index a property like this:

@Persisted(indexed: true) var name: String = ""

Is this index different than the Atlas Collection Indexes described here? → https://www.mongodb.com/docs/manual/indexes/

I suspect the Atlas Collection indexes aren’t relevant for the Realm SDK, but do they help/hurt in any way? If I set indexes on my Collections (say, for when I query that data outside of Realm) does that impact Realm’s performance in some way (such as slowing down writes/updates even more because now there are two indexes that must be maintained)?

Ideally the Docs would clarify this because I think “Realm Indexes” are not the same thing as Atlas Collection Indexes.

In case an answer is posted to your other question, here’s a link since they involve a similar topic.

Is there a better place to ask this question? A client decided to suddenly add 1,000% more data to their app and searches via the UI have slowed considerably. I’m investigating ways to mitigate that and need to understand the distinction between these two indexes.

As far as I know, these forums and StackOverflow are generally the two places visited by employees that moderate.

Realm is pretty darn fast and we’ve never really had performance issues - ever with Gb of data. And probably more importantly, Realm should have negligible impact on the UI

There are a number of things that can cause ‘slow’ performance - for example; casting Collections of Realm objects to Swift Array’s and/or using high level Swift functions like map, reduce or filter. That can also impact the UI.

Being an offline first Database - queries are more dependent on the hardware it’s running on and the code you’ve crafted. Keeping in mind that

indexes defined in your client-side models are applied only for the local database and will not affect your Atlas cluster. The reason is that you may have different client applications that have different query patterns and may have different local indexes.

Indexes are somewhat tricky in that there’s only certain queries that will see improvements - queries that use IN and equality comparisons - and I think sorting if I recall.

Adding indexes provides a huge improvement on those query use cases. I did a drag race a few years ago on indexed vs non-indexed property and the indexed read took 1/3 the time. See my answer to this SO question for details.

Perhaps if you have some brief sample code that’s “slow”, we may be able to spot something - it could just be the way it is but it can’t hurt to take a look. I think my concern is how it’s affecting the UI as it shouldn’t.

Hi Jay, thanks! I’m familiar with the usual anti-patterns. In this case, the slowness is caused by the nature of the query: case-insensitive, diacritic-insensitive, and CONTAINS.

Imagine iTunes. A giant TableView with 30 columns: title, artist, label, genre, etc. Now imagine there are 1.3 million rows. That’s 1,300,000 x 30 searches plus the overhead of non-exact matching. It’s slow.

I currently don’t have indexes on these properties for exactly the reason you mentioned: indexes don’t speed up CONTAINS queries. As a test, I changed the app to search for matches with exact equality and, in that case, search returns to being virtually instant.

So the challenge is how to improve CONTAINS[cd] query performance across large sets of data where many properties must be searched.

What’s still unclear to me is whether adding indexes to the Atlas Collections will help speed up Realm queries at all (assuming == queries instead of contains). Realm operates on a local copy of the database, but at some point the sync engine has to interface with Atlas and I’m not certain if indexes help or hurt that sync process.

Interesting - and correct, indexing will not help ‘contains’ queries.

Indexes on object models in Realm only apply to local queries and have no impact or effect on what’s stored in Atlas.

Technically you could use App Services to query the server directly and in that case server side indexes would certainly boost performance. However, then you’re dealing with internet lag and more importantly, working with the raw data filing up ram with big queries. You would loose the lazy-loading-ness of Realm objects but the query may be quicker but the data has to get to the app etc. So likely not a solution.

So one of our apps has millions of row and we display data in a tableView and Realm is performant and responsive - we don’t display it all at once so the user doesn’t get a finger cramp scrolling through all of that.

However while are searches are case-insensitive[cd] as well, we only query on one field - it sounds like you’re queries are across multiple fields?

Yea, the search queries operate on all 30 fields for each row (model object). Realm is still performant at this scale in everything except this search. The Tableview is powered by a live collection and, of course, loads only a small subset of the objects as the user scrolls. That’s all fine.

But when I have to search the entire collection across 30 columns and use CONTAINS[cd] to do it, that hangs for 5-9 seconds.

Realm Swift was supposed to have “full text search”, but that was “coming soon” last summer and there’s been no updates from the Swift SDK team on it. I figure that might help me.

I did a little digging and testing and have an option:

I created an object with 30 fields. Then created 1M of those objects and populated each field witha random phrase of anywhere from 3 to 8 words. This resulted in a Realm file of about 2.11 Gb.

Then in code, I crafted a query to return results using an or (||) search across all the properites- looks like this

let phraseResults = actor.realm.objects(PhraseClass.self).where {
    $0.a.contains("crazy", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.b.contains("rebels", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.c.contains("square", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.d.contains("Children", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.e.contains("people", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.f.contains("fire", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.g.contains("Billy Thorpe", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.h.contains("sky", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.i.contains("genius", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.j.contains("human", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.k.contains("sound", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.l.contains("ignore", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.m.contains("Earth", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.n.contains("crazy", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.o.contains("rebels", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.p.contains("square", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.q.contains("Children", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.r.contains("people", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.s.contains("fire", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.t.contains("Billy Thorpe", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.u.contains("sky", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.v.contains("genius", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.w.contains("human", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.x.contains("sound", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.y.contains("ignore", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.z.contains("Earth", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.za.contains("ships", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.zb.contains("forward", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.zc.contains("doors", options: [.diacriticInsensitive, .caseInsensitive]) ||
    $0.zd.contains("Sun", options: [.diacriticInsensitive, .caseInsensitive])
}

Test 1: query was run with no options ( no [cd]); 863,000 results came back in about a second

Test 2: ran that query with [cd] as shown; 863,000 results came back in about 12 seconds - matching your results

Test 3: ran a query with .case insensitive only - 863,000 results in about 1.5 seconds

so the conclusion is Diacritic Insensitivity is the slowdown (which makes sense)

One possible solution is denormalize the data: store two sets of data; one for display and one for query.

class PhraseClass: Object {
    @Persisted var a = "" //this is used for display
    @Persisted var aStripped = "" //this is used for queries
}

the a property is what’s used in the UI wheres the aStripped property has had diacritics removed and set to lower case. While this will increase the size of the file and add additional code, it will provide whipping fast queries by comparison.

Thanks! That is pretty enlightening—I assumed case-insensitivity had a larger impact than what your test shows.

I considered normalizing the properties, but that’s a large increase in database size. I think what I’ll end up doing is normalizing the columns that are most frequently searched and then modifying the app’s UI so that the search function isn’t just a dumb field that blindly searches everything, but rather a popup that provides options: which columns to search, whether to be diacritic insensitive, etc.

For the slow route, I’ll toss the query to a background actor and return a set of primary IDs. I’ll re-query for those objects on the main thread so I can get a live collection to power the table view.

Out of curiosity, do the .where{} and .filter() approaches hit the same code path in Realm? Or is the older NSPredicate-based approach less efficient? In this case, with 30 properties, I have a loop that constructs an OR compound predicate from the property names, but I’m willing to type them all out if .where{} is a more optimized path.

I believe they do - our testing shows exactly the same performance either way. We really like the type-safe formatting but the predicate is very flexible - especially being able to build a NSCompoundPredicate as you mentioned.

Just ensure you don’t do this

.filter { $0.some_property == "some value"} //swift function with {}

and you do - do this

.filter(your_predicate) //realm function with () - safe and lazy-loading

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.